This amends commit 394788c68efacdec2676988b4b4ff207b20557f2 (its
ChangeLog applies to this commit too). That fixed sorting of UTF8-to-
UTF16, but when adding more unit tests, I've discovered that some UTF-16
strings also sorted incorrectly. There were two problems:
First, we were assuming that we could rely on the UTF-16 length as a
proxy for the UTF-8 one, but that's not true for some cases:
* both 1-, 2- and 3-codepoint UTF-8 sequences are 1 codepoint
in UTF-16, so some strings would have identical UTF-16 length
* 4-codepoint UTF-8 sequences shrink to 2-codepoint UTF-16 ones
(2:1) but 3-codepoint UTF-8 sequences shrink to 1 (3:1), so
some strings would be longer in UTF-16 but shorter in UTF-8.
Second, QtPrivate::compareStrings performs UTF-16 codepoint comparisons
not Unicode character ones, so surrogate pairs were sorting before
U+E000 to U+FFFF.
To fix all of this, we need to decode the UTF-16 string into UTF-32 and
calculate the length of that in UTF-8 to be sure we have the sorting
order right.
Since this is a slight behavior change with a performance penalty, I am
choosing to backport only to 6.7. The penalty mostly does not apply to
6.8 due to commit 61556627f25e7c7acbfcc5e54127a392b5239977.
Change-Id: If1bf59ecbe014b569ba1fffd17c4c4ddcc874aac
Reviewed-by: Ivan Solovev <ivan.solovev@qt.io>
(cherry picked from commit d4c7da9a07dc1434692fe08a61ba22c794574c4f)
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>