107 Commits

Author SHA1 Message Date
Mårten Nordheim
85899ff181 Update UCD to Unicode 16.0.0
They added some new scripts.

There were a few changes to the line break algorithm,
most notably there is more rules that require more context than before.
While not major, there was some shuffling and additions to our
implementation to match the new rules.

IDNA test data now disallows the trailing dot/empty root label,
technically to be toggled off by an option that controls a few things,
but we don't have options. For test-data they changed the format a
little - "" is used to mean empty string, while a blank segment is
null/no string, update the parser to read this.

[ChangeLog][Third-Party Code] Updated the Unicode Character Database to
UCD revision 34/Unicode 16.

Fixes: QTBUG-132902
Task-number: QTBUG-132851
Pick-to: 6.9 6.8 6.5
Change-Id: I4569703659f6fd0f20943110a03301c1cf8cc1ed
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2025-02-10 18:36:55 +01:00
Mårten Nordheim
62685375a2 Unicode tool: use unsigned values for the bitfields
On MSVC the values stored end up as negative.

Task-number: QTBUG-132902
Pick-to: 6.9 6.8 6.5
Change-Id: I963c57c34479041911c1364a1100d04998bdfaed
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2025-02-04 15:08:09 +01:00
Mårten Nordheim
f7d0366207 Unicode tool: print values using portable types
ssize_t is not universal; fails to compile in Windows.

Task-number: QTBUG-132902
Pick-to: 6.9
Change-Id: I4b8f45cba32202329ac085c7caa0a8c19a11c621
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2025-02-04 15:08:09 +01:00
Mårten Nordheim
e99d5c6268 Unicode tool: handle required QFile::open return
Fixing warnings/errors about QFile::open() return value not being
checked, and print the name of the file and the error message that
occurred.

Task-number: QTBUG-132902
Pick-to: 6.9
Change-Id: I099b300b5fd4563334fa547ffa365ec3f68e08cf
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2025-02-04 15:08:09 +01:00
Lucie Gérard
4c83657448 Add REUSE.toml files
Those files are read by reuse to complement or override the copyright
and licensing information found in file.

The use of REUSE.toml files was introduced in REUSE version 3.1.0a1.
This reuse version is compatible with reuse specification
version 3.2 [1].
With this commit's files,
* The SPDX document generated by reuse spdx conforms to SPDX 2.3,
* The reuse lint command reports that the Qt project is reuse compliant.

[1]: https://reuse.software/spec-3.2/

Task-number: QTBUG-124453
Task-number: QTBUG-125211
Pick-to: 6.8
Change-Id: I01023e862607777a5e710669ccd28bbf56091097
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Reviewed-by: Joerg Bornemann <joerg.bornemann@qt.io>
2024-11-05 14:36:16 +01:00
Eskil Abrahamsen Blomfeldt
b614aa10b9 Extract emoji data from Unicode files
Expand unicode data to include information needed to
parse emoji sequences. This is a pre-requisite for
automatically preferring color fonts for emojis.

As a drive-by, this also fixes a double space in the
output of the uc_properties array.

Task-number: QTBUG-111801
Change-Id: Icd993803c87c69ed278c7724377028f3706d0272
Reviewed-by: Eirik Aavitsland <eirik.aavitsland@qt.io>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2024-08-06 10:00:08 +02:00
Edward Welbourne
d5e40b5e58 Revise UCD-generated data files' SPDX headers
The existing data comes under Unicode-DFS-2016 but future updates
shall come under Unicode-3.0, so update the existing headers with the
former and the generator script with the latter. Leave a note in the
attribution file about this transitional state and how to resolve it.

Replaced UNICODE_LICENSE.txt from src/corelib/text/ with
LICENSES/Unicode-DFS-2016.txt, as fetched using reuse download.
This doesn't look like a rename but only actually adds some irrelevant
lines about where on the Unicode website the upstream files (to which
we do not apply this license) come from and changes some spacing.

Pick-to: 6.7 6.5
Fixes: QTBUG-121653
Change-Id: I50c9f4badc77a9aa402af946561aff58ae9e3e7a
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
Reviewed-by: Kai Köhne <kai.koehne@qt.io>
2024-04-22 15:22:12 +00:00
Ievgenii Meshcheriakov
1f73d4b87c Unicode line breaking: Implement rules LB15a and LB15b
The new rules were added in Unicode 15.1 (TR #14, revision 51).

The rules read:

    LB15a: (sot | BK | CR | LF | NL | OP | QU | GL | SP | ZW)
           [\p{Pi}&QU] SP* ×
    LB15b: × [\p{Pf}&QU] (SP | GL | WJ | CL | QU | CP | EX
           | IS | SY | BK | CR | LF | NL | ZW | eot)

Add two new line breaking classes LineBreak_QU_Pi and _QU_Pf to
represent quotation characters with context that matches left
side of LB15a and right side of LB15b respectively. This way
it is still possible to use the line breaking classes table.

Also add a coment about the original source of the line
break table.

Task-number: QTBUG-121529
Change-Id: Ib35f400e39e76819cd1c3299691f7b040ea37178
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
2024-02-08 17:43:58 +01:00
Ievgenii Meshcheriakov
bfd09ec38c unicode: Import version 15.1 (UCD version 32)
Add enumerator for the new Unicode version to QChar::UnicodeVersion.

Remap new line breaking classes to their Unicode 15.0 values:
* AK, AP and AS to AL,
* VI and VF to CM.
These are classes for new line breaking support for Indic scripts
that require more work.

Blacklist failing tests for now:
* tst_QUrlUts46::idnaTestV2
* tst_QTextBoundaryFinder::lineBoundariesDefault
* tst_QTextBoundaryFinder::graphemeBoundariesDefault

Regenerate the source files.

Task-number: QTBUG-121529
Change-Id: I869cc9fbaa53765d8ae6265c22cdbef9f19d05bf
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2024-02-08 16:43:58 +00:00
Ievgenii Meshcheriakov
1e7f1e5b73 Update Unicode data version string
This amends c4e550703c2bdc1ee710507b8df9c0c9a118402e. The data version
update was just forgotten when updating to Unicode 15.0.

Pick-to: 6.5 6.6 6.7
Change-Id: Ibb3e9cb81e9bbcb5d4aaf4e4df6231485531c128
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2024-01-25 17:37:48 +00:00
Ievgenii Meshcheriakov
c4e550703c Update UCD to Revision 30
This corresponds to Unicode version 15.0.0.

Added the following scripts:

    * Kawi
    * Nag Mundari

Full support of these scripts requires harfbuzz version 5.2.0,
this version adds support for Unicode 15.0:

    https://github.com/harfbuzz/harfbuzz/releases/tag/5.2.0

Fixes: QTBUG-106810
Change-Id: Ib06c526e49b0f01ef9f21123bcf875c6b19f2601
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2022-10-11 14:10:59 +00:00
Yuhang Zhao
34c21d0407 Core: make Unicode Database constexpr
Task-number: QTBUG-100485
Pick-to: 6.3 6.2
Change-Id: I41480a34b14fd86a68a5c10b7e0f3d250e785d0f
Reviewed-by: Marc Mutz <marc.mutz@qt.io>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2022-05-26 12:42:36 +08:00
Ievgenii Meshcheriakov
838a7a01f3 Unicode: Extract EastAsianWidth property
This property is needed to properly implement the line breaking
algorithm from UAX #14.

Task-number: QTBUG-97537
Pick-to: 6.3
Change-Id: Ia83cc553c9ef19fae33560721630849d2a95af84
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2022-05-24 23:07:43 +02:00
Ievgenii Meshcheriakov
1a26719c54 Unicode: Remove obsolete word break classes
Remove E_Base, Glue_After_Zwj, E_Base_GAZ, and E_Modifier obsoleted by
UTS #29, version 33 (Unicode 11.0.0).

Task-number: QTBUG-97537
Pick-to: 6.2 6.3
Change-Id: If5dc36ae17cd8746bbe81b73bbcc0863181e4a7a
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2022-05-24 23:07:42 +02:00
Lucie Gérard
05fc3aef53 Use SPDX license identifiers
Replace the current license disclaimer in files by
a SPDX-License-Identifier.
Files that have to be modified by hand are modified.
License files are organized under LICENSES directory.

Task-number: QTBUG-67283
Change-Id: Id880c92784c40f3bbde861c0d93f58151c18b9f1
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Jörg Bornemann <joerg.bornemann@qt.io>
2022-05-16 16:37:38 +02:00
Ievgenii Meshcheriakov
826fc8c9bd Update UCD to Revision 28
This corresponds to Unicode version 14.0.0.

Added the following scripts:

    * CyproMinoan
    * OldUyghur
    * Tangsa
    * Toto
    * Vithkuqi

Full support of these scripts requires harfbuzz version 3.0.0,
this version adds support for Unicode 14.0:

    https://github.com/harfbuzz/harfbuzz/releases/tag/3.0.0

With this release 10 test cases in tst_qurluts46 were fixed, one
additional test case is failing in tst_qtextboundaryfinder and
is commented out. In total 62 line break test cases and 44 word
break test cases are failing.

A comment in src/corelib/text/qt_attribution.json was updated to
include the URL of the page containing UCD version number.

Fixes: QTBUG-94359
Change-Id: Iefc9ff13f3df279f91cbdb1246d56f75b20ecb35
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-10-18 16:45:10 +00:00
Ievgenii Meshcheriakov
394f8a9c4b Unicode: Add script to facilitate UCD update
Add a script that downloads UCD data for a given Unicode version,
unpacks it, and copies the used files to appropriate locations inside
the Qt source code.

Also update the README and use an HTTPS link for the UCD data file.
FTP links are no longer supported by some browsers.

Task-number: QTBUG-94359
Change-Id: I2aa70a588f675e411fa6b3ce5b4444a7c07ed707
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-10-05 20:38:02 +02:00
Ievgenii Meshcheriakov
8be862687d unicode: Fix typo s/supersting/superstring/
Thanks to Konstantin Ritt for spotting it.

Change-Id: Ia3b5c4103b315cdb690fcd8b42239f000acdbef0
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-09-15 21:58:12 +02:00
Ievgenii Meshcheriakov
5f3f073126 unicode: Build IDNA map superstring greedily
This slightly reduces memory required for mapping tables:

    uncompressed size: 1146 characters
    consolidated size: 904 characters
    memory usage: 47856 bytes

Task-number: QTBUG-85323
Change-Id: Ic960789e433e80acf1a4e36791533a1c55a735c8
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-09-03 14:43:16 +02:00
Ievgenii Meshcheriakov
2d78b71fc6 unicode: Pack 2 QChar's into one entry for IDNA mapping
Store up to 2 QChar's for mapping values inside the mapping
table itself. This reduces the size of the superstring for
other mapping values.

results:
    uncompressed size: 1146 characters
    consolidated size: 1001 characters
    memory usage: 48050 bytes

Task-number: QTBUG-85323
Change-Id: I922a6d2037551d0532ddae1a032ec1a9890f40a7
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-09-03 14:43:16 +02:00
Ievgenii Meshcheriakov
ca1eeb23fa unicode: More compact IDNA mapping tables implementation
This implementation stores mapping that are 1 QChar long
inside the mapping tables, the longer mappings are stored as
an offset and length pairs pointing into the common superstring
of all such mapping values.

Size comparison with the existing implementation follows.

old:
    max mapping length: 6
    memory usage: 103608 bytes

new:
    uncompressed size: 3250 characters
    consolidated size: 2367 characters
    memory usage: 50782 bytes

Task-number: QTBUG-85323
Change-Id: I9f2e32438dd463457e0fcd783136bb17145e27a8
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-09-03 14:43:16 +02:00
Ievgenii Meshcheriakov
4e460aa3f7 tests: Add test for UTS #46 implementation using Unicode data
The new test is called tst_qurluts46. It verifies QUrl::{to,from}Ace()
functionality using the data from IdnaTestV2.txt supplied by Unicode.

The file was downloaded from
https://www.unicode.org/Public/idna/13.0.0/IdnaTestV2.txt

Task-Id: QTBUG-85371
Change-Id: I4c6a4942ef6018dafc90cb84ef73f6b2614566d7
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-09-01 14:15:45 +02:00
Ievgenii Meshcheriakov
2afe1a3c19 unicode: Generate tables for IDNA/UTS #46
Update the Unicode data processing tool to generate properties
and mapping tables needed to implement UTS #46
(https://unicode.org/reports/tr46/). The implementation extends
the standard to allow usage of underscores in URLs. This is done
for compatibility with DNS-SD and SMB protocols.

The data file needed to generate the new properties was taken from
https://www.unicode.org/Public/idna/13.0.0/IdnaMappingTable.txt

Task-number: QTBUG-85323
Change-Id: I2c303bf8a08aefb18a7491fb9b55385563bfa219
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-08-26 16:55:05 +02:00
Giuseppe D'Angelo
a794c5e287 Unicode: fix the extended grapheme cluster algorithm
UAX #29 in Unicode 11 changed the EGC algorithm to its current form.
Although Qt has upgraded the Unicode tables all the way up to
Unicode 13, the algorithm has never been adapted; in other words,
it has been working by chance for years. Luckily, MOST
of the cases were dealt with correctly, but emoji handling
actually manages to break it.

This commit:

* Adds parsing of emoji-data.txt into the unicode table generator.
  That is necessary to extract the Extended_Pictographic property,
  which is used by the EGC algorithm.

* Regenerates the tables.

* Removes some obsoleted grapheme cluster break properties, and
  adds the ones added in the meanwhile.

* Rewrites the EGC algorithm according to Unicode 13. This is
  done by simplifying a lot the lookup table. Some rules (GB11,
  GB12, GB13) can't be done by the table alone so some hand-rolled
  code is necessary in that case.

* Thanks to these fixes, the complete upstream GraphemeBreakTest
  now passes. Remove the "edited" version that ignored some rows
  (because they were failing).

Change-Id: Iaa07cb2e6d0ab9deac28397f46d9af189d2edf8b
Pick-to: 6.1 6.0 5.15
Fixes: QTBUG-92822
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
2021-04-16 20:31:39 +02:00
Giuseppe D'Angelo
8546e017c0 util/unicode: enable asserts unconditionally
If one "accidentally" uses a release build of the unicode tool,
the asserts within it won't fire. Enable them in all cases.

Change-Id: I9d63641dc6d6d2e5805b61b36f8c28e624b25e12
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-04-16 16:49:28 +02:00
Edward Welbourne
78cf89c07d Use checked string iteration in case conversions
The Unicode table code can only be safely called on valid code-points.
So code that calls it must only pass it valid Unicode data. The string
iterator's Unchecked Unchecked methods only provide this guarantee
when the string being iterated is guaranteed to be valid UTF-16; while
client code should only use QString, QStringView and friends on valid
UTF-16 data, we have no way to be sure they have respected that.

So take the few extra cycles to actually check validity in the course
of iterating strings, when the resulting code-points are to be passed
to the Unicode table look-ups. Add tests that case mapping doesn't
access Unicode tables out of range (it'll trigger the new assertion).
Added some comments to qchar.h that helped me understand surrogates.

Change-Id: Iec2c3106bf1a875bdaa1d622f6cf94d7007e281e
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-08-29 18:15:27 +02:00
Edward Welbourne
1fb35832df Simplify initialization of UnicodeData and PropertyFlags structs
Initialize values where they're declared, where possible.

Change-Id: Ib6bf33b27b19c76f406f78bc8a1bd9729bd8f2cd
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-08-28 21:27:51 +02:00
Edward Welbourne
a111dd26b1 Document the indexing used in the Unicode tables
Make clear why we don't need to assert against out-of-bounda accesses
in the generated code, provided the code point is within its bound,
(Using one table's early entries as indices into later in the same
table at which to look up indices into another table made it a little
hard to work out what was going on, especially as nothing told me
about the early / late distinction. Record what I discovered, to save
the next person to stumble into this some confusion.)

Change-Id: I8e5771a7f3d70c1911aeae1b0cabe5c47bc7e9c7
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-08-20 09:02:00 +02:00
Edward Welbourne
ca034e4e50 Inline two macros in the unicode tables
They were only used by one function each, in unicodetables.cpp, so
don't need to be macros.

Change-Id: I3e7f9f661568862d0a0d265bb8f657a8e0782b13
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-08-12 13:47:56 +02:00
Edward Welbourne
f12ddfaecc Tidy up unicode table generation
Eliminate some needless parentheses, tidy up some spacing and
indentation and split some long lines.  Change first += after
declaration to initializer.

Change-Id: I05ff2a6337b7ed14e0a2dc9c03fc784c92b63515
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-08-05 10:02:11 +02:00
Edward Welbourne
e3e6d58cad Use %zd for size-type formatting in unicode table generator
Qt6 makes sizes qsizetype; and one of these was already sizeof()-sized.
While qsizetype might not be ssize_t, it's at least no bigger, so we
can safely use its format specifier, with a suitable cast.

Change-Id: I433f654f6b139d74b4d5358b804b44ab1f0ada15
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
2020-08-04 13:28:34 +02:00
Edward Welbourne
e536fc7975 Fix deprecation warnings (s/hex/Qt::hex/gw) in unicode table generator
Removed three warnings, rather than fixing them, as Konstantin Ritt
tells me they've been redundant since Unicode 6 or so.

Change-Id: I4507e852bceb08a0252c77a8b383aceac212aad9
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
2020-08-04 13:28:34 +02:00
Edward Welbourne
b69092b13e Fix compilation error in unicode table generator
Don't include a QString::number() in a sum of QByteArray and C strings.

Change-Id: I7544e835fcf5625b1fe1ee2055a48600200daafd
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
2020-08-04 13:28:34 +02:00
Jarek Kobus
a7f9d5a7fa Use QList instead of QVector in util
Task-number: QTBUG-84469
Change-Id: I077fb5c32456d438a457c1f73852313ea2ea9ae5
Reviewed-by: Friedemann Kleint <Friedemann.Kleint@qt.io>
2020-07-07 20:34:48 +02:00
Karsten Heimrich
18ec53156e Move QTextCodec support out of QtCore
* Assume UTF-8 on all Unix like systems
* Export some functions to be able to compile QTextCodec once
  moved to Qt5Compat.

Task-number: QTBUG-75665
Change-Id: I52ec47a848bc0ba72e9c7689668b1bcc5d736c29
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2020-06-20 02:04:38 +02:00
Giuseppe D'Angelo
3e1d03b1ea Port Q_STATIC_ASSERT(_X) to static_assert
There is no reason for keep using our macro now that we have C++17.
The macro itself is left in for the moment being, as well as its
detection logic, because it's needed for C code (not everything
supports C11 yet).  A few more cleanups will arrive in the next few
patches.

Note that this is a mere search/replace; some places were using
double braces to work around the presence of commas in a macro, no
attempt has been done to fix those.

tst_qglobal had just some minor changes to keep testing the macro.

Change-Id: I1c1c397d9f3e63db3338842bf350c9069ea57639
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2020-06-19 19:38:23 +02:00
Marc Mutz
19e7c0d2b5 QChar/QString: centralize case folding in qchar.cpp
There are (at least) two implementations of the low-level case-folding
algorithm, one of which (for QChar::toLower()) seems to be wrong (it
doesn't deal with special cases which expand to more than one code
point).

The algoithm hidden in QString and entangled with the QString
detaching code makes reusing the code much harder.

At the same time, the dependency of the algorithm on the unicode
tables makes exposing a non-allocating result type in the public API
hard. std::u16string would be an alternative if we can assure that all
implementations use SSO with at least four characters.

So, for the time being, leave this as internal API for use in an
upcoming QStringView::toLower() as well as case-insensitive hashing.

Change-Id: Iabb2611846f6176776aa20e634f44d8464f3305c
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-05-09 06:25:05 +00:00
Marc Mutz
7b04e0012b QUnicodeTables: port to charNN_t
This makes existing calls passing uint or ushort ambiguous, so
fix all the callers. There do not appear to be callers outside
QtBase. In fact, the ...BreakClass() functions appear to be
utterly unused.

Change-Id: I1c2251920beba48d4909650bc1d501375c6a3ecf
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
2020-04-27 13:08:41 +02:00
Marc Mutz
20cdf807b1 QChar: port low-level functions from uint/ushort to char32/16_t
Now that the standard gives us proper types for UTF-16 and UTF-32
characters, use them. Will eventually make the code much easier to
read than today, where uint could be an index as well as a char32_t.

It also ensures that the result of e.g. QChar::highSurrogate() can
still be implicitly converted to a QChar now that the
QChar(non-characater-integral-types) ctors are being made explicit.

[ChangeLog][QtCore][QChar] All low-level functions
(e.g. highSurrogate()) now take and return char16_t instead of ushort
and char32_t instead of uint.

Change-Id: I9cd8ebf6fb998fe1075dae96c7c4484a057f0b91
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
2020-04-24 12:45:53 +02:00
Edward Welbourne
54f8be6cc0 Update UCD to Revision 26
Include WordBreakTest.html, since a test uses sample strings from it,
albeit without actually reading the file.

Had to comment out more of the new tests, as at Revision 24, pending
an update to harfbuzz and the text boundary detection code.

Task-number: QTBUG-79631
Task-number: QTBUG-79418
Task-number: QTBUG-82747
Change-Id: I0082294b09d67ffdc6a9b5c15acf77ad3b86f65f
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2020-03-14 11:26:59 +01:00
Edward Welbourne
65ea4948dc Unicode tables: minor prettification
Put blank lines before the final Num*Classes entries in enums, to set
them off visibly from the "real" members. Moved some oddly placed
commas to the ends of preceding lines, so that later additions can
just add lines (with comma on end) without having to modify the
preceding line while doing so.

Change-Id: I5188dc25af9e4c17a1882fd9dab070e88013060b
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2019-11-28 11:04:39 +01:00
Edward Welbourne
1a1718b342 Add missing docs for UCD additions at 5.15
Also remove two stray commas pointed out in code-review and some
others noticed on checking for similar.
This amends commit c3eb521a0f10112df6b61d2592351c4eef2e1f9b.

Change-Id: If20c5146b740defe8d25ff61d399031b5c66ded1
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2019-11-28 11:04:27 +01:00
Edward Welbourne
c3eb521a0f Update UCD data to Unicode 12.1.0's Revision 24
Had to teach the update program to accept category Lm as for
Joining_Transparent, for the sake of a new ArabicShaping.txt entry.
Added three new Unicode versions, several new scripts and a new
word-break class.

Updated UCD's test data for tst_QTextBoundaryFinder.  This left 57
tests failing; I have commented out the data rows for those tests,
pending someone with more knowledge addressing this.

Task-number: QTBUG-79631
Task-number: QTBUG-79418
Change-Id: Ic33d3b3551195d47a84d98e84020f57a68f0b201
Reviewed-by: Eskil Abrahamsen Blomfeldt <eskil.abrahamsen-blomfeldt@qt.io>
2019-10-30 17:38:02 +01:00
Edward Welbourne
6852ba815d Correct some references to corelib/tools/ to say corelib/text/
The Unicode data tables moved with QString and friends.
So did the locale data generated from CLDR.

This amends commit a9aa206b7b8ac4e69f8c46233b4080e00e845ff5.

Change-Id: If12f0420b559dcb78993adc00e9f39751bca684a
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
2019-10-25 11:44:27 +02:00
Marc Mutz
effbf147a4 QUnicodeTables: use array for case folding tables
Instead of four pairs of :1 :15 bit fields, use an array of four :1,
:15 structs.  This allows to replace the case folding traits classes
with a simple enum that indexes into said array.

I don't know what the WASM #ifdef'ed code is supposed to effect (a :0
bit-field is only useful to separate adjacent bit-field into separate
memory locations for multi-threading), but I thought it safer to leave
it in, and that means the array must be a 64-bit block of its own, so
I had to move two fields around.

Saves ~4.5KiB in text size on optimized GCC 10 LTO Linux AMD64 builds.

Change-Id: Ib52cd7706342d5227b50b57545d073829c45da9a
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2019-09-04 16:35:37 +00:00
Marc Mutz
2737b5e36a QUnicodeTables: pack Properties struct
GCC doesn't like the sequence

   : 5
   : 5
   : 8
   : 6
   : 8

and inserts a :6 padding between the :5 and the :8 and a :2 padding
between the :6 and the :8, growing the bitfield by 8 bits of embedded
padding and another byte to bring the struct back to sizeof % 2 == 0.

Fix by reshuffling the elements and adding a static_assert for the
next round.

Saves ~5KiB in QtCore executable size.

Change-Id: I4758a6f48ba389abc2aee92f60997d42ebb0e5b8
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2019-09-04 10:06:19 +02:00
Edward Welbourne
a9aa206b7b Move text-related code out of corelib/tools/ to corelib/text/
This includes byte array, string, char, unicode, locale, collation and
regular expressions.

Change-Id: I8b125fa52c8c513eb57a0f1298b91910e5a0d786
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
2019-07-10 17:05:30 +02:00
Sona Kurazyan
ff2b2032a0 Remove usages of deprecated APIs from QtAlgorithms
Task-number: QTBUG-76491
Change-Id: I9dab736a0cbd2e86588919640c26e8ce6b3674d0
Reviewed-by: Alex Blasche <alexander.blasche@qt.io>
Reviewed-by: Leena Miettinen <riitta-leena.miettinen@qt.io>
2019-06-29 21:58:36 +02:00
Allan Sandfeld Jensen
95f787bfdc Replace Q_DECL_NOTHROW with noexcept the remaining places
The first replacement had missed objective-C++ code some places ourside
the src dir.

In C-files Q_DECL_NOTHROW is replaced with Q_DECL_NOEXCEPT as we still
need to turn it off when compiled in C mode, but can get rid of the old
NOTHROW moniker.

Change-Id: I6370f57066679c5120d0265a69e7e378e09d4759
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2019-04-09 14:48:42 +00:00
Eskil Abrahamsen Blomfeldt
01380dc267 Remove broken code from unicode generator
The current state produces uncompilable code.

Change-Id: I9a68b61866a4a416335ed4d7204c58122803fb1c
Reviewed-by: Eskil Abrahamsen Blomfeldt <eskil.abrahamsen-blomfeldt@qt.io>
2019-03-18 15:17:10 +00:00