The lack of this was hidden by other rules (redundant with it) until
CLDR v45, but v46 prunes the redundant rules, breaking this. So
include the missing rule and tweak the code that assumed likely
sub-tag rules preserved language, since this one doesn't. Rework the
tail of withLikelySubtagsAdded() to correctly use this rule, now that
we have it. (The prior comment about there being no match-all was
wrong: CLDR did have it, but our data skipped it.) Amended one test
affected by it (when system locale wasn't en_US).
Pick-to: 6.8
Task-number: QTBUG-130877
Change-Id: I2a415b67af4bc8aa6a766bcc1e349ee5bda9f174
Reviewed-by: Mate Barany <mate.barany@qt.io>
Although the other parts of the locale-specific data for zone name
L10n were written using safeInTag() I'd foolishly used plain inTag()
for the exemplar city - which, of course, can also contain crazy stuff
and, it turns out, one of them (albeit this may be a CLDR "whoopsie")
does in fact end in a < (in xnr.xml in v46). So use safeInTag()
there as well and be faithful to CLDR (even if this does turn out to
be an error).
Task-number: QTBUG-130877
Change-Id: Idca22ce689cdd2409c50078498a2badfeecd4de2
Reviewed-by: Mate Barany <mate.barany@qt.io>
Also fix the annotation of englishNaming in cldr.py. Spotted it while
annotating __enumTable.
Task-number: QTBUG-129564
Pick-to: 6.8
Change-Id: I93f698b4cf1b5ae90c21fe77330e4f167143a9f3
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
The other methods that open a tag, potentially with attributes, all
now share __attrJoin() as the tool to combine the attributes and the
tag. Do the same in __openTag().
Change-Id: Ib252b5901b9e1459cbb8c5706ff56f1b7b639d3d
Reviewed-by: Mate Barany <mate.barany@qt.io>
These are probably remnants of times forgotten.
Task-number: QTBUG-129564
Pick-to: 6.8
Change-Id: Ic3ec03201758801e341253cd82ab8034f7fde9b7
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
The lookup into it is done case-insensitively (because user-supplied
names of zones might not have the right case) but I forgot to make the
sorting of the data table case-insensitive in the aliases. Regenerate
data: only the qtimezone*_data_p.h are changed by the reindexing of
zone aliases.
Pick-to: 6.8
Change-Id: Id5e95c245c7ca421a77298f23baefe6b7021a396
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
The data is very big but much of it is inherited by zones from those
that they map to via likely-subtag reduction, so omit the data where
it coincides with the result of such an inheritance; this shall
complicate the reading of the data, but saves dramatically on its
size, reducing it to "only" c. 2 MiB.
Task-number: QTBUG-115158
Change-Id: I53ff13e29f1f73a551d73d75773373bb90673c8e
Reviewed-by: Mate Barany <mate.barany@qt.io>
This includes the data in the Locale objects read prior to writing
CLDR data out to relevant files. Actually writing the new data out
shall follow in a later commit.
Task-number: QTBUG-115158
Change-Id: Iaf1466242eb31e66d8ace0bec2ffe7554f66fc10
Reviewed-by: Mate Barany <mate.barany@qt.io>
This makes the XML file bigger by a factor of roughly 8, at about 30
MB. Code to read the new data out of it shall follow in a later
commit.
Task-number: QTBUG-115158
Change-Id: I7b9b6abe88be2457fa6cf0e8d7b6a68845136770
Reviewed-by: Mate Barany <mate.barany@qt.io>
This also expands the IANA ID table (by about 5 KiB) even when the
feature is inactive, since it includes all IANA zones referenced by
the new data, as well as those for which CLDR has aliases.
Add code to QTZlocale.cpp to use this locale-independent data. This
shall need expanded once locale-dependent data is also available.
Task-number: QTBUG-115158
Change-Id: I720f10cb9ae4cf87dfd8bb66af965a45d49c389a
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Use CDATA when outside ASCII. Share the attribute-packing code for an
open-tag in a static method. In passing, tweak a comment's text.
Change-Id: Ic8b75afc56d537a1a51d13797c737d4bfcc1f910
Reviewed-by: Mate Barany <mate.barany@qt.io>
The msLandZones/territorycode and msZoneIana/iana can become
attributes of their parent nodes instead of child elements, as
territory codes and IANA IDs are plain ASCII and not unduly long.
In the process, rename territorycode to territory.
Change-Id: Iab9901da01d15abc8c5db7a7d57f925fce8bb521
Reviewed-by: Mate Barany <mate.barany@qt.io>
Replacing elements for the alias and IANA ID with attribute makes the
table more compact, albeit the ComodRivadavia like is a little long.
(Some existing msLandZones/ianaids lines are longer, though.)
Change-Id: Iab2b55a21857402ad7c863ef33abd241f1d58a8d
Reviewed-by: Mate Barany <mate.barany@qt.io>
This makes the likely subtag part of the file more compact.
Introduces a QLocaleXmlWriter.asTag() for attribute-only elements;
this requires the Spacer to recognize self-closing elements as not
increasing the indent needed.
Change-Id: I1b73b755f9841617a5c002cf624785321e808d0c
Reviewed-by: Mate Barany <mate.barany@qt.io>
The existing naming lists provide the needed mapping and this prepares
the way to move the language, script and territory into the from and
to elements as attributes, saving some file-size. It incidentally
pushes the mapping to enum values upstream and simplifies the
downstream processing.
Change-Id: I8f6d2615d52b14d46d1b795539c71f8afdc310ca
Reviewed-by: Dennis Oberst <dennis.oberst@qt.io>
These were written (and empty for the Any* enum members) but never
read. We, in any case, infer what we need from the enum members, via
the languageList, scriptList and territoryList elements.
In the process, add a comment between the fromXml() and toXml()
methods of Locale to remind those editing the code to also edit the
schema describing the XML.
Change-Id: Ie5e51f594c2636802eefd8159954105718d9af52
Reviewed-by: Øystein Heskestad <oystein.heskestad@qt.io>
Reviewed-by: Mate Barany <mate.barany@qt.io>
This means LocaleDataWriter.likelySubtags() now only gets an iterable,
so doesn't know when it's on the last item to skip the comma after it,
but that seems to be acceptable in modern C++.
Change-Id: I9d3bb9af3bb46b28b7a2529e27ab72a72c358503
Reviewed-by: Mate Barany <mate.barany@qt.io>
The id and code are reliably pure ASCII with no special characters, so
can safely be expressed as attributes. Extend the reader and writer
classes to handle using attributes on a simple text element.
This leaves only the name as text content, so skip the extra
<name>...</name> layer. As the resulting element is inside a *List
element that tells us whether it's a language, script or territory we
don't need to have different elements and can unify them all as simply
a <naming id="..." code="...">...</naming> element. This makes these
sections of the XML file considerably terser, with no change to the
generated data.
Change-Id: Id2e884f1d2713341524549cc49253eb33b5aa487
Reviewed-by: Mate Barany <mate.barany@qt.io>
One character instead of four adds up to a lot of saved bytes when a
file has many lines: and the timezone name L10n data is going to add a
lot of lines.
Task-number: QTBUG-115158
Change-Id: I856f3771266a70b7a9ef4078a9b4aecf42315831
Reviewed-by: Mate Barany <mate.barany@qt.io>
Make our encoding explicit and enable more tools to understand what
they're looking at.
Change-Id: I29327364a5eaac51eeda9a4fb3b8e9b7527ca488
Reviewed-by: Ivan Solovev <ivan.solovev@qt.io>
Also move the CLDR version into the tag. The version numbers are plain
ASCII, with no special characters, so can safely be attributes. In
the process, fix a mistake in __openTag()'s handling of attributes;
join with plain space, no comma.
Having the Qt version in the XML makes it possible to assert
compatibility between the Qt version that generated it and the one
that's consuming it.
Change-Id: I6fa6b668b072ff3616955d81af2cffaba5b67250
Reviewed-by: Mate Barany <mate.barany@qt.io>
The duplicate entries just bulked up the intermediate file.
Makes no change to generated data.
Task-number: QTBUG-115158
Change-Id: I6dc0d1f79f8dcf2e46264c6f9d1ae06ff4c91394
Reviewed-by: Mate Barany <mate.barany@qt.io>
It was setting *_code='0' for the Any* forms of language, script and
territory; this is wrong, the codes for these are all empty or other
special tokens (like 'und', 'Zzzz', 'ZZ'). The IDs for them are zero,
as an int not a string, but were omitted. Also add the variant
details, for all that they're currently unused, for consistency.
This makes no difference to the generated data.
Task-number: QTBUG-115158
Change-Id: I339d1b201e50e2bbc510758ffbbaae0fa02277d4
Reviewed-by: Mate Barany <mate.barany@qt.io>
The qlocalexml.py Locale.C() had to replicate a whole lot of data that
isn't really relevant to how C differs from en_US and every addition
to what we support required further additions to it. So pass the en_US
Locale object to the pseudoconstructor so that C can inherit from it
and only override the parts where we care about the difference.
Hand-code shortening for short Jalali month names, to match Soroush's
original contribution, and include the narrow forms in the hard-coded
data to keep the generated data unchanged (for now). Note some of the
departures from CLDR; we may want to drop these overrides later.
In the process, convert the mapping from keys to locales to
consistently use IDs for all members of the key, instead of using the
(empty) code value for (as yet unused) variant; it now gets ID 0 and
is consistent with returns from codesToIdNames(). This makes life
easier for the code that now has to construct an en_US key.
Task-number: QTBUG-115158
Change-Id: I3d7acb6a4059daec1bba341fcf015c39c7a6803b
Reviewed-by: Kai Köhne <kai.koehne@qt.io>
Omit parentheses round what python will form into a tuple anyway.
Include trailing commas on last entries of tuples so adding future
entries don't drag the existing line into their diffs.
Let the writer's tag-opener handle attributes, if supplied.
Clean up spacing in some doc-strings.
This is all preparation for further changes, to limit their diffs.
Change-Id: I989ae28bbd235b2af9c1d72467d4741c4f1f20ae
Reviewed-by: Mate Barany <mate.barany@qt.io>
Future work shall need the timezone alias data to be synchronized
between the (expanded) locale-independent timezone data and the
(coming) locale-dependent timezone data. The latter shall need to come
via QLocaleXml, hence the former now needs to, too.
This makes no change to the generated data, aside from changing the
regeneration instructions for qtimezoneprivate_data_p.h, to use the
same scripts as locale data, instead of cldr2qtimezone.py, which is
now removed.
Task-number: QTBUG-115158
Change-Id: I47ddd95f6af1855cbb1f601e9074c13f213cd61c
Reviewed-by: Mate Barany <mate.barany@qt.io>
The QLocale XML reader was passing datetime formats through a format
conversion despite the data being converted at the point where we read
it from CLDR. It turns out this was needed because the long date and
time formats in our hard-coded data for the C Locale object used CLDR
format strings, unlike all other Locale objects. Fix those two formats
in the C locale and remove the redundant processing step. This, in
turn, enables the parser to include the date and time formats in its
general handling of most fields that it reads.
This does not result in any change to the generated data QLocale uses
(although it does change the intermediate QLocale XML file).
Task-number: QTBUG-115158
Change-Id: Iaf9da206158043dda2e9e5a3790f009b100e46b4
Reviewed-by: Mate Barany <mate.barany@qt.io>
It has many grumbles about spacing, but at least this code is
currently consistent about its departure from PEP8's spacing rules
(and closer to Qt's) for the present. We can review whether to do a
drastic spacing revolution later.
Change-Id: Ife4e8a5b02b63434bd9c7ac7ba4cbc11b6311f9f
Reviewed-by: Mate Barany <mate.barany@qt.io>
They're a bit more readable than calling dict on a generator.
Change-Id: I3177e31b1f617b80d1cf5d5f83df7036fc0c4c01
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
Various comments need to continue using the enumdata.py names, as they
associate data with particular enum members, but we can now correctly
use the en.xml versions of their names when we report them, rather
than the enum-friendly names we use in the code. Since this now means
the data may stray outside plain ASCII - it'll be UTF-8-encoded - this
implies replacing the QLatin1StringView()s of the code that formerly
read this data with QString::fromUtf8().
Fixes: QTBUG-94460
Change-Id: Id3b08875a46af58c0555c3e303b0e15a19441509
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
The former needed the latter's .dupes to do the job, so can now just
take a method as a tool to do the job instead, letting .dupes become
private. In the process refine the munging to free enumdata.py from
having to capitalize each word in its names. This will, in due course,
let us use more natural forms in various comments. This causes no
change to generted data.
Update enumdata.py's introduction doc, mainly to reflect this but also
fixing the out-of-date names (old *_list have long been *_map) and
adding some details to other paragraphs.
Task-number: QTBUG-94460
Change-Id: If195b2e94a53a495fc4f1f216bed07a910439fa7
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
The script and territory to exclude from reports about unused ones
were swapped, so we excluded a territory from the script list (which
didn't contain it anyway) and vice versa.
TheTest for whether to report used the non-existend .territories
attribute by mistake for .__territories
Change-Id: I29e9d9f8f34883d7c3a5ac15470d9e7a0366e3db
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
Replace the current license disclaimer in files by
a SPDX-License-Identifier.
Files that have to be modified by hand are modified.
License files are organized under LICENSES directory.
Task-number: QTBUG-67283
Change-Id: Id880c92784c40f3bbde861c0d93f58151c18b9f1
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Jörg Bornemann <joerg.bornemann@qt.io>
Replace most uses of str.format() and string arithmetic by f-strings.
This results in more compact code and the code is easier to read
when using an appropriate editor.
Task-number: QTBUG-83488
Pick-to: 6.2
Change-Id: I3409f745b5d0324985cbd5690f5eda8d09b869ca
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
The schema is in RelaxNG Compact syntax. It can be used to validate
files produced by the cldr2qlocalexml.py script and also gives an
overview of the file format.
Change-Id: I344978f2201c5e67e236ab580a12ad33262f33cb
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
This way the output is easier to compare between versions.
Task-number: QTBUG-83488
Pick-to: 6.2
Change-Id: If4053c574c4ad200a179b06276bd889f2cb9e1c6
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Output of cldr2qlocalexml.py looks weird without the final new line.
Task-number: QTBUG-83488
Pick-to: 6.2
Change-Id: I5d675e475c57cdc8101887c39052007ba0a19857
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
We should at least know when members of QLocale's enums aren't adding
any value, and it may make sense to deprecate the unused ones.
Change-Id: Icf202f81d2a35904c13ccdc202d41985bcb3f2e6
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
Change the nomenclature used in the scripts and the QLocaleXML data
format to use "territory" and "territories" in place of "country" and
"countries". Does not change the generated source files.
Change-Id: I4b208d8d01ad2bfc70d289fa6551f7e0355df5ef
Reviewed-by: JiDe Zhang <zhangjide@uniontech.com>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
These variables provide mappings, not lists, so name them non-deceptively.
Change-Id: Idf15e78ad73790bc86dd8b9d4f248d1c4f73993c
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
The only reason cldr.py imported enumdata was so as to pass what it
imported to writer.enumData(); that method might as well do the import
itself.
Change-Id: Ie77dcd29058f926b8cca4deef35837f30505859f
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>