The lack of this was hidden by other rules (redundant with it) until
CLDR v45, but v46 prunes the redundant rules, breaking this. So
include the missing rule and tweak the code that assumed likely
sub-tag rules preserved language, since this one doesn't. Rework the
tail of withLikelySubtagsAdded() to correctly use this rule, now that
we have it. (The prior comment about there being no match-all was
wrong: CLDR did have it, but our data skipped it.) Amended one test
affected by it (when system locale wasn't en_US).
Pick-to: 6.8
Task-number: QTBUG-130877
Change-Id: I2a415b67af4bc8aa6a766bcc1e349ee5bda9f174
Reviewed-by: Mate Barany <mate.barany@qt.io>
The AnyTerritory entries in the zoneDataTable are derived from
territory="ZZ" entries in the upstream CLDR data; the World ones from
territory="001". The latter give the default IANA ID for each MS ID,
the former give an (often legacy) IANA ID for the MS ID, that is not
based on geography. Some of these are being removed at CLDR v46.
The documentation said the ZZ entries have "no known territorial
association", hinting that there may be some (unknown) territorial
association; however, CLDR's inclusion of them is as entries with a
known non-territorial association, so revise the phrasing to reflect
this.
Also document that windowsIdToDefaultIanaId() returns empty when
there is no territory-specific value, and callers can use the
territory-neutral call to get a suitable value in that case. (They
may, however, wish to distinguish this case, to treat it differently,
so I decided not to just return that in place of empty in any case.)
The upstream CLDR tables do have entries for territory 001, so we
should report these if asked for World as territory. Amend the
available zone ID lookup and mapping from MS to IANA functions that
take a territory to duly handle World via the default-data that was
derived from 001 data in CLDR, instead of from the territory-varying
table, from which those were effectively filtered out when generating
the two tables. Update docs to mention this handling of World, for
contrast with that of AnyTerritory.
In the process remove a spurious split-on-space from the MS to default
IANA lookup, asserting there is no space (in a field now stored in the
table for single IANA ID entries, instead of the one for space-joined
lists of them in which it used to be stored, before I noticed it's
always only one ID). There is a matching assertion in the cldr.py code
that extracts the data. Added an assertion to this last, that each
default IANA ID given by CLDR's MS data does in fact also appear as
one of the IANA IDs for at least one territory (potentially ZZ), and
comment in C++ code on why this means we don't need to scan the
windowsDataTable in a few places, where it would just produce
duplicate entries.
[ChangeLog][QtCore][QTimeZone] Corrected handling of QLocale::World
and clarified in docs how QLocale::AnyTerritory is handled when
QTimeZone selects zones by territory.
Pick-to: 6.8
Task-number: QTBUG-130877
Change-Id: I861c777c68b0cb73a194138fe23fbff839df49e6
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Also fix the annotation of englishNaming in cldr.py. Spotted it while
annotating __enumTable.
Task-number: QTBUG-129564
Pick-to: 6.8
Change-Id: I93f698b4cf1b5ae90c21fe77330e4f167143a9f3
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Add some type annotatons to cldr2qlocalexml.py as well. Based on the
default arguments the constructor of CldrReader was expecting callables
that return None, but in reality we are passing in functions that
return integers.
Task-number: QTBUG-129613
Pick-to: 6.8
Change-Id: I06832240956ea635ca0cc0ec45c466a3b2539ff7
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
A missing update of a "last" variable meant the loop inevitably did
nothing useful. Include type-annotation for last, while doing this.
Thankfully the check still doesn't find any duplications, now that
I've fixed it so that actually would, were any present.
Pick-to: 6.8 6.5
Change-Id: I672e6570359a3ff102a364d8af98c5c8c0bdc4d9
Reviewed-by: Mate Barany <mate.barany@qt.io>
Found these while adding type annotations.
Task-number: QTBUG-129566
Pick-to: 6.8
Change-Id: I51c8e5676f958094946c0e6f396b98c083fd9de0
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Apparently there used to be a mechanism where an alias element in a
top-level LDML element could serve to provide a parent locale as its
source attribute. That is long gone and, since at least a decade ago,
alias elements only ever appear in root.xml, with source="locale" and
a path that starts ../ (so is a relative XPath).
Ditch some complications (that I transcribed faithfully five-ish years
ago when transforming the scripts), replacing them with assertions
that check what's now documented in the LDML spec and confirmed by my
own grep-checks in the CLDR data. This incidentally made one prior
(weaker) check redundant, so I've now removed that from the look-up
for the tags that identify a locale. That look-up is only ever
performed after the DOM root nodes it uses have come through the scan
of locale roots that now does the stronger check.
Makes no difference to generated data.
Change-Id: I811ffbef5f5ecb69183d68fa8bda57281f2a579d
Reviewed-by: Mate Barany <mate.barany@qt.io>
The variable ianalist is not really used for anything, it was probably
meant to be ianaList.
Pick-to: 6.8
Change-Id: Ie9f42bf9716da28ee0017319dda96389c415ef4f
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
This makes the XML file bigger by a factor of roughly 8, at about 30
MB. Code to read the new data out of it shall follow in a later
commit.
Task-number: QTBUG-115158
Change-Id: I7b9b6abe88be2457fa6cf0e8d7b6a68845136770
Reviewed-by: Mate Barany <mate.barany@qt.io>
This also expands the IANA ID table (by about 5 KiB) even when the
feature is inactive, since it includes all IANA zones referenced by
the new data, as well as those for which CLDR has aliases.
Add code to QTZlocale.cpp to use this locale-independent data. This
shall need expanded once locale-dependent data is also available.
Task-number: QTBUG-115158
Change-Id: I720f10cb9ae4cf87dfd8bb66af965a45d49c389a
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
This is the locale-independent part of the data, for inclusion in
qtimezoneprivate_data_p.h in some form.
Task-number: QTBUG-115158
Change-Id: Ic46f53dd22d45ddc999633bc1bb4a0a3cf6d5112
Reviewed-by: Mate Barany <mate.barany@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
The existing naming lists provide the needed mapping and this prepares
the way to move the language, script and territory into the from and
to elements as attributes, saving some file-size. It incidentally
pushes the mapping to enum values upstream and simplifies the
downstream processing.
Change-Id: I8f6d2615d52b14d46d1b795539c71f8afdc310ca
Reviewed-by: Dennis Oberst <dennis.oberst@qt.io>
The qlocalexml.py Locale.C() had to replicate a whole lot of data that
isn't really relevant to how C differs from en_US and every addition
to what we support required further additions to it. So pass the en_US
Locale object to the pseudoconstructor so that C can inherit from it
and only override the parts where we care about the difference.
Hand-code shortening for short Jalali month names, to match Soroush's
original contribution, and include the narrow forms in the hard-coded
data to keep the generated data unchanged (for now). Note some of the
departures from CLDR; we may want to drop these overrides later.
In the process, convert the mapping from keys to locales to
consistently use IDs for all members of the key, instead of using the
(empty) code value for (as yet unused) variant; it now gets ID 0 and
is consistent with returns from codesToIdNames(). This makes life
easier for the code that now has to construct an en_US key.
Task-number: QTBUG-115158
Change-Id: I3d7acb6a4059daec1bba341fcf015c39c7a6803b
Reviewed-by: Kai Köhne <kai.koehne@qt.io>
Future work shall need the timezone alias data to be synchronized
between the (expanded) locale-independent timezone data and the
(coming) locale-dependent timezone data. The latter shall need to come
via QLocaleXml, hence the former now needs to, too.
This makes no change to the generated data, aside from changing the
regeneration instructions for qtimezoneprivate_data_p.h, to use the
same scripts as locale data, instead of cldr2qtimezone.py, which is
now removed.
Task-number: QTBUG-115158
Change-Id: I47ddd95f6af1855cbb1f601e9074c13f213cd61c
Reviewed-by: Mate Barany <mate.barany@qt.io>
The CLDR's "IANA" IDs may (for the sake of stability) date back to
before IANA's own naming has been updated. As a result, the "IANA" IDs
we were using were in some cases out of date. CLDR does provide a
mapping from its stable IDs to all aliases and the current IANA name
for each (which I shall soon be needing in other work), so use that to
map the CLDR IDs to contemporary IANA ones.
Revise the documentation of CldrAccess.readWindowsTimeZones() to take
this into account, pass it the alias mapping from the table, use that
to map IDs internally and, in passing, rename a variable. Update
cldr2qtimezone.py to match the new CldrAccess methods and regenerate
the data.
Change-Id: I23d8a7d048d76392099d125376b544a41faf7eb3
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Mate Barany <mate.barany@qt.io>
There are various legacy IANA IDs that we should recognize as aliases
for their contemporary equivalents. Later work shall also take these
into account in the Windows IDs. Scan CLDR's data about these aliases
and use it when constructing QTimeZone. This adds aliasMappingTable
and aliasIdData arrays to QTZP_data_p.h and an AliasData type to its
QtTimeZoneCldr namespace.
Change-Id: I1bbfce62959a7e1b7a0bc4a320c32f5a174a2ff2
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
They're a bit more readable than calling dict on a generator.
Change-Id: I3177e31b1f617b80d1cf5d5f83df7036fc0c4c01
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
Although the code does not, in fact, know about them, it's more
pertinent to say that they're unsupported than to say that the variant
in question is unknown.
Change-Id: I411d792dc91f2d7af58a4b7919c952a005b3417e
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
A comment dated from when variables misleadingly named language_list,
script_list and country_list actually held mappings not lists; they've
been renamed to s/list/map/ a while back, so rephrase.
Use a dict-comprehension rather than the somewhat lisp-ier invocation
of the dict constructor on an iterator over pairs.
Change-Id: Ibcb97122434122dbb1dcb0f621aae02b25a4e1fa
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
Various comments need to continue using the enumdata.py names, as they
associate data with particular enum members, but we can now correctly
use the en.xml versions of their names when we report them, rather
than the enum-friendly names we use in the code. Since this now means
the data may stray outside plain ASCII - it'll be UTF-8-encoded - this
implies replacing the QLatin1StringView()s of the code that formerly
read this data with QString::fromUtf8().
Fixes: QTBUG-94460
Change-Id: Id3b08875a46af58c0555c3e303b0e15a19441509
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Moving it makes it easier to document what it's up to and why, while
leaving __checkEnum() easier to read; and I'm going to need it
elsewhere anyway. This makes no difference to generated data.
Task-number: QTBUG-94460
Change-Id: I684375bc926d5d54928fbf5b5e08978528aef487
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
Prefer stand-alone versions of the names when available. This saves
the need for a Han-specific kludge in the check for discrepancies
between our enum names and the en.xml names. Causes no change to
generated locale data.
Pick-to: 6.6 6.5
Change-Id: I162f3107d6ffc1f8b893b206e0b78b61cf7254f6
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
In the Windows zone-ID code, we tokenize() a text extracted from CLDR
data. However, a leading or trailing space (or a repeated internal
space) would then give an empty "IANA ID" for us to match, causing the
empty ID to be mapped to the Windows ID for the entry with the
superfluous space. This was uncovered by an entry with a trailing
space in CLDR v43's data.
Canonicalize spacing in the IANA ID lists extracted from CLDR so as to
ensure this doesn't happen. (We could pass Qt::SkipEmptyParts to the
tokenize() call, but fixing the issue when generating the data is
cheaper and more robust than fixing it at run-time every time it's
consulted.)
Task-number: QTBUG-111550
Change-Id: Ib3883419558d6574141e9ab0bc929ade2d73e020
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
From CLDR v43, "The parentLocale elements now have an optional
component attribute, with a value of segmentations or
collations. These should be used for inheritance for those respective
elements." Since we aren't extracting collation or segmentation data
for the present, omit these elements from the scan for parentLocale
information.
Task-number: QTBUG-111550
Change-Id: I42871929f539c1852471812801953f2fc8be0e8a
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
Replace the current license disclaimer in files by
a SPDX-License-Identifier.
Files that have to be modified by hand are modified.
License files are organized under LICENSES directory.
Task-number: QTBUG-67283
Change-Id: Id880c92784c40f3bbde861c0d93f58151c18b9f1
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Jörg Bornemann <joerg.bornemann@qt.io>
pathlib's API is more modern and easier to use than os.path. It
also allows to distinguish between paths and other strings in type
annotations.
Task-number: QTBUG-83488
Pick-to: 6.2
Change-Id: Ie6d9b4e35596f7f6befa4c9635f4a65ea3b20025
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Replace most uses of str.format() and string arithmetic by f-strings.
This results in more compact code and the code is easier to read
when using an appropriate editor.
Task-number: QTBUG-83488
Pick-to: 6.2
Change-Id: I3409f745b5d0324985cbd5690f5eda8d09b869ca
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
The behavior of StopIteration in generators was changed in Python 3
(see https://www.python.org/dev/peps/pep-0479/). Not raising that
exception makes it easier to port the code to Python 3.
Task-number: QTBUG-83488
Pick-to: 6.2
Change-Id: Iac6e3f6f1e1e8ef3a1a0d89b19d2ac2d186434f5
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Change the nomenclature used in the scripts and the QLocaleXML data
format to use "territory" and "territories" in place of "country" and
"countries". Does not change the generated source files.
Change-Id: I4b208d8d01ad2bfc70d289fa6551f7e0355df5ef
Reviewed-by: JiDe Zhang <zhangjide@uniontech.com>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
These variables provide mappings, not lists, so name them non-deceptively.
Change-Id: Idf15e78ad73790bc86dd8b9d4f248d1c4f73993c
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
The use of "Country" is misleading as some entries in the enumeration
are not countries (eg, HongKong), for all that most are. The Unicode
Consortium's Common Locale Data Repository (CLDR, from which QLocale's
data is taken) calls these territories, so introduce territory-based
names and prepare to deprecate the country-based ones in due course.
[ChangeLog][QtCore][QLocale] QLocale now has Territory as an alias for
its Country enumeration, and associated territory-based names to match
its country-named methods, to better match the usage in relevant
standards. The country-based names shall in due course be deprecated
in favor of the territory-based names.
Fixes: QTBUG-91686
Change-Id: Ia1ae1ad7323867016186fb775c9600cd5113aa42
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Compare the code->name mappings we're using to the ones CLDR's
common/main/en.xml provides; report discrepancies. Tolerate tags
missing from en.xml if they're known to the locale-inheritance
machinery.
Change-Id: Ibe96c18bf55984a35de3b3644f3586a9f30720b2
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
Conflicts:
examples/opengl/doc/src/cube.qdoc
src/corelib/global/qlibraryinfo.cpp
src/corelib/text/qbytearray_p.h
src/corelib/text/qlocale_data_p.h
src/corelib/time/qhijricalendar_data_p.h
src/corelib/time/qjalalicalendar_data_p.h
src/corelib/time/qromancalendar_data_p.h
src/network/ssl/qsslcertificate.h
src/widgets/doc/src/graphicsview.qdoc
src/widgets/widgets/qcombobox.cpp
src/widgets/widgets/qcombobox.h
tests/auto/corelib/tools/qscopeguard/tst_qscopeguard.cpp
tests/auto/widgets/widgets/qcombobox/tst_qcombobox.cpp
tests/benchmarks/corelib/io/qdiriterator/qdiriterator.pro
tests/manual/diaglib/debugproxystyle.cpp
tests/manual/diaglib/qwidgetdump.cpp
tests/manual/diaglib/qwindowdump.cpp
tests/manual/diaglib/textdump.cpp
util/locale_database/cldr2qlocalexml.py
util/locale_database/qlocalexml.py
util/locale_database/qlocalexml2cpp.py
Resolution of util/locale_database/ are based on:
https://codereview.qt-project.org/c/qt/qtbase/+/294250
and src/corelib/{text,time}/*_data_p.h were then regenerated by
running those scripts.
Updated CMakeLists.txt in each of
tests/auto/corelib/serialization/qcborstreamreader/
tests/auto/corelib/serialization/qcborvalue/
tests/auto/gui/kernel/
and generated new ones in each of
tests/auto/gui/kernel/qaddpostroutine/
tests/auto/gui/kernel/qhighdpiscaling/
tests/libfuzzer/corelib/text/qregularexpression/optimize/
tests/libfuzzer/gui/painting/qcolorspace/fromiccprofile/
tests/libfuzzer/gui/text/qtextdocument/sethtml/
tests/libfuzzer/gui/text/qtextdocument/setmarkdown/
tests/libfuzzer/gui/text/qtextlayout/beginlayout/
by running util/cmake/pro2cmake.py on their changed .pro files.
Changed target name in
tests/auto/gui/kernel/qaction/qaction.pro
tests/auto/gui/kernel/qaction/qactiongroup.pro
tests/auto/gui/kernel/qshortcut/qshortcut.pro
to ensure unique target names for CMake
Changed tst_QComboBox::currentIndex to not test the
currentIndexChanged(QString), as that one does not exist in Qt 6
anymore.
Change-Id: I9a85705484855ae1dc874a81f49d27a50b0dcff7
When doing XPATH searches, child nodes that have distinguished
attributes that were not asked for should be skipped. This is part of
the LDML spec and matters when resolving locale inheritance. Scan the
LDML DTD (previously only scanned for the CLDR version) to find which
attributes of which tags are ignorable - all others are distinguished
- and take the result into account when performing XPATH searches.
The XPath we were using for currency formats wasn't excluding
currencyFormatLength elements with type="short" and patterns specific
to thousands (and larger multiples); this is fixed by taking
distinguished attributes into account. However, the XPATH also wasn't
specifying the always distinguished attribute type="standard" that
was, in practice, used for nearly all locales that weren't (wrongly)
using short-forms for thousands; so type="standard" is now made
explicit, so as to minimize the diff.
This leaves only twenty-one locales with a negative currency formats.
A later commit shall switch to using accounting by default (it falls
back via an alias to standard, in any case), thereby restoring the two
mentioned below that were using it by accident, but the present change
gives the minimal diff here.
Thousands-specific formats replaced with sensible ones:
* zh_Hant_{HK,MO} (Traditional Mandarin, Hong Kong and Macau)
* eo_001 (Esperanto)
* fr_CA (Canadian French)
* ha_* (Hausa, when not written in Arabic)
* es_{GT,MX,US} (Spanish - Guatemala, Mexico, USA)
* sw_KE (Swahili, Kenya)
* yi_001 (Yiddish)
* mfe_MU (Morisyen, Mauritius)
* lag_TZ (Langi, Tanzania)
* mgh_MZ (Makhuwa Meetto, Mozambique)
* wae_CH (Walser, Switzerland)
* kkj_CM (Kako, Cameroon)
* lkt_US (Lakota, USA)
* pa_Arab_PK (Punjabi, in Arabic script, as used in Pakistan; uses
arabext number system, whose currency falls back to latn's, for
which pa_Arab over-rides the thousands-format).
Format changed from an over-ridden type="accounting" to standard (so
these lost a negative-specific form) in:
* en_SI (English, Slovenia)
* es_DO (Spanish, Dominican Republic; same)
For some locales we were picking up over-rides of narrow or short list
formats, or formats for or-lists or unit-lists rather than and-lists,
in place of the standard list format, that these locales don't
over-ride, provided by a parent locale. This changed list formats for:
* en_CA, en_IN (dropped "Oxford" comma before "and")
* qu_* (Quechua; dropped "utaq", presumably meaning "and")
* ur_IN (Urdu, India; was using unit-list formats)
[ChangeLog][QtCore][QLocale] Data used for currency formats in several
locales and list patterns in some locales have changed due to now
parsing the CLDR data more faithfully.
Fixes: QTBUG-81344
Change-Id: I6b95c6c37db92df167153767c1b103becfb0ac98
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
Move the code out to a CldrReader class in cldr.py, expand CldrAccess
with facilities that needs, expand ldml.py to include support for more
features, finally making xpathlite.py redundant. This initial commit
aims, though, to be bug-for-bug compatible with xpathlite in its
reading of the CLDR data.
It turns out we've been using draftier data than we were aware of
(which might not be a bad thing). The xpathlite code appeared to check
for draft attributes, but these only appear on leaf nodes and most
data were fetched by finding a parent and then scanning its children
without the draft check; only am/pm data was actually being excluded
based on draft values. (We allowed contributed, for am/pm, in
addition to approved, which is all the xpathlite code allows
otherwise.) There are also some less equivocal bugs; I'll deal with
these in later commits.
Simplified number-system data look-ups; the old get_number_in_system()
was taking care of old LDML versions' placement of the number system
attribute; this is no longer needed. (It was also being used for a
currency value to which it was not appropriate, which is now handled
separately; this is one of the bugs mentioned above.) Ditched a
fall-back to nativeZeroDigit, which no longer exists in CLDR.
Change the command-line to take the root of the CLDR data tree, rather
than its common/main/ sub-directory. Support naming the file to which
to write output, as a second command-line argument, instead of always
writing to stdout (which remains the default) and leaving whoever runs
the script to redirect stdout.
Support (internally for now, while adding TODOs to give main() more
command-line options) separating the stderr output into its more and
less interesting parts; for now, continue producing both, but suppress
the least interesting entirely.
Task-number: QTBUG-81344
Change-Id: Ie611b47403a9452b51feaeeaaa0fbc8f7e84dc71
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
This begins the process of replacing xpathlite.py, adding low-level
DOM-access classes to ldml.py and the CldrAccess class to cldr.py
Moved a format comment from cldr2qtimezone.py's doc-string to the
method of CldrAccess that does the actual reading.
Task-number: QTBUG-81344
Change-Id: I46ae3f402f8207ced6d30a1de5cedaeef47b2bcf
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>