qtbase/src/corelib/serialization
Thiago Macieira 8764a0c79d CBOR: fix sorting of UTF16-to-UTF16 strings
This amends commit 394788c68efacdec2676988b4b4ff207b20557f2 (its
ChangeLog applies to this commit too). That fixed sorting of UTF8-to-
UTF16, but when adding more unit tests, I've discovered that some UTF-16
strings also sorted incorrectly. There were two problems:

First, we were assuming that we could rely on the UTF-16 length as a
proxy for the UTF-8 one, but that's not true for some cases:
* both 1-, 2- and 3-codepoint UTF-8 sequences are 1 codepoint
  in UTF-16, so some strings would have identical UTF-16 length
* 4-codepoint UTF-8 sequences shrink to 2-codepoint UTF-16 ones
  (2:1) but 3-codepoint UTF-8 sequences shrink to 1 (3:1), so
  some strings would be longer in UTF-16 but shorter in UTF-8.

Second, QtPrivate::compareStrings performs UTF-16 codepoint comparisons
not Unicode character ones, so surrogate pairs were sorting before
U+E000 to U+FFFF.

To fix all of this, we need to decode the UTF-16 string into UTF-32 and
calculate the length of that in UTF-8 to be sure we have the sorting
order right.

Since this is a slight behavior change with a performance penalty, I am
choosing to backport only to 6.7. The penalty mostly does not apply to
6.8 due to commit 61556627f25e7c7acbfcc5e54127a392b5239977.

Change-Id: If1bf59ecbe014b569ba1fffd17c4c4ddcc874aac
Reviewed-by: Ivan Solovev <ivan.solovev@qt.io>
(cherry picked from commit d4c7da9a07dc1434692fe08a61ba22c794574c4f)
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2024-04-23 19:03:23 -07:00
..
2022-05-16 16:37:38 +02:00
2022-05-16 16:37:38 +02:00
2022-05-16 16:37:38 +02:00
2022-05-16 16:37:38 +02:00
2022-05-16 16:37:38 +02:00
2022-05-16 16:37:38 +02:00
2023-12-14 03:01:40 +00:00
2022-05-16 16:37:38 +02:00
2022-05-16 16:37:38 +02:00
2023-05-01 21:52:22 +02:00
2023-04-18 13:39:26 +00:00