QString: update docs to prefer "UTF-32" over "UCS-4"

They are now the same, but the name UTF-32 is preferred over UCS-4.

The original ISO-10646 UCS-4 encoding was allowed to use all 31-bit code
units, from 0 to 0x7FFFFFFF[1] including those above 0x10FFFF, which
correspond to UTF-8's five- and six-byte sequences. Unicode doesn't
allow that and restricts the UTF to the range possible in UTF-16.

Renaming the functions is left as an exercise for the reader.

[1] https://en.wikipedia.org/wiki/UTF-32#History

Pick-to: 6.8
Change-Id: I2f29db62b974cb689585fffd9a6434ae252a7651
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
(cherry picked from commit 973d0c4c5160200c188f81da5df064510315f22d)
Reviewed-by: Qt Cherry-pick Bot <cherrypick_bot@qt-project.org>
This commit is contained in:
Thiago Macieira 2024-12-06 08:28:43 -08:00 committed by Qt Cherry-pick Bot
parent 31e3012296
commit b143215c54

View File

@ -2404,7 +2404,7 @@ encoded in \1, and is converted to QString using the \2 function.
Reads the first \a size code units of the \c wchar_t array to whose start Reads the first \a size code units of the \c wchar_t array to whose start
\a string points, converting them to Unicode and returning the result as \a string points, converting them to Unicode and returning the result as
a QString. The encoding used by \c wchar_t is assumed to be UCS-4 if the a QString. The encoding used by \c wchar_t is assumed to be UTF-32 if the
type's size is four bytes or UTF-16 if its size is two bytes. type's size is four bytes or UTF-16 if its size is two bytes.
If \a size is -1 (default), the \a string must be '\\0'-terminated. If \a size is -1 (default), the \a string must be '\\0'-terminated.
@ -2443,7 +2443,7 @@ qsizetype QString::toUcs4_helper(const char16_t *uc, qsizetype length, char32_t
Fills the \a array with the data contained in this QString object. Fills the \a array with the data contained in this QString object.
The array is encoded in UTF-16 on platforms where The array is encoded in UTF-16 on platforms where
wchar_t is 2 bytes wide (e.g. windows) and in UCS-4 on platforms wchar_t is 2 bytes wide (e.g. windows) and in UTF-32 on platforms
where wchar_t is 4 bytes wide (most Unix systems). where wchar_t is 4 bytes wide (most Unix systems).
\a array has to be allocated by the caller and contain enough space to \a array has to be allocated by the caller and contain enough space to
@ -5846,8 +5846,8 @@ static QList<uint> qt_convert_to_ucs4(QStringView string);
Returns a UCS-4/UTF-32 representation of the string as a QList<uint>. Returns a UCS-4/UTF-32 representation of the string as a QList<uint>.
UCS-4 is a Unicode codec and therefore it is lossless. All characters from UTF-32 is a Unicode codec and therefore it is lossless. All characters from
this string will be encoded in UCS-4. Any invalid sequence of code units in this string will be encoded in UTF-32. Any invalid sequence of code units in
this string is replaced by the Unicode's replacement character this string is replaced by the Unicode's replacement character
(QChar::ReplacementCharacter, which corresponds to \c{U+FFFD}). (QChar::ReplacementCharacter, which corresponds to \c{U+FFFD}).
@ -5879,8 +5879,8 @@ static QList<uint> qt_convert_to_ucs4(QStringView string)
Returns a UCS-4/UTF-32 representation of \a string as a QList<uint>. Returns a UCS-4/UTF-32 representation of \a string as a QList<uint>.
UCS-4 is a Unicode codec and therefore it is lossless. All characters from UTF-32 is a Unicode codec and therefore it is lossless. All characters from
this string will be encoded in UCS-4. Any invalid sequence of code units in this string will be encoded in UTF-32. Any invalid sequence of code units in
this string is replaced by the Unicode's replacement character this string is replaced by the Unicode's replacement character
(QChar::ReplacementCharacter, which corresponds to \c{U+FFFD}). (QChar::ReplacementCharacter, which corresponds to \c{U+FFFD}).
@ -6103,7 +6103,7 @@ QString QString::fromUtf16(const char16_t *unicode, qsizetype size)
\since 5.3 \since 5.3
Returns a QString initialized with the first \a size characters Returns a QString initialized with the first \a size characters
of the Unicode string \a unicode (ISO-10646-UCS-4 encoded). of the Unicode string \a unicode (encoded as UTF-32).
If \a size is -1 (default), \a unicode must be \\0'-terminated. If \a size is -1 (default), \a unicode must be \\0'-terminated.
@ -9413,7 +9413,7 @@ QString &QString::setRawData(const QChar *unicode, qsizetype size)
/*! \fn QString QString::fromStdU32String(const std::u32string &str) /*! \fn QString QString::fromStdU32String(const std::u32string &str)
\since 5.5 \since 5.5
\include qstring.cpp {from-std-string} {UCS-4} {fromUcs4()} \include qstring.cpp {from-std-string} {UTF-32} {fromUcs4()}
\sa fromUcs4(), fromStdWString(), fromStdU16String() \sa fromUcs4(), fromStdWString(), fromStdU16String()
*/ */