Rework enumdata.py's comments

Turn the large comment at the start into a doc-string and add some
more details to it. Fix the Ivory Coast comment's indent and a typo in
it.

Change-Id: I36b4e5094d3c3d5c5b91809424b424bcac5daafa
Reviewed-by: Friedemann Kleint <Friedemann.Kleint@qt.io>
This commit is contained in:
Edward Welbourne 2024-03-11 18:13:57 +01:00
parent 4017bc5adc
commit a643a956d4

View File

@ -1,34 +1,60 @@
# Copyright (C) 2021 The Qt Company Ltd.
# SPDX-License-Identifier: LicenseRef-Qt-Commercial OR GPL-3.0-only WITH Qt-GPL-exception-1.0
# A run of cldr2qlocalexml.py will produce output reporting any
# language, script and territory codes it sees, in data, for which it
# can find a name (taken always from en.xml) that could potentially be
# used. There is no point adding a mapping for such a code unless the
# CLDR's common/main/ contains an XML file for at least one locale
# that exercises it (and little point absent substantial data).
"""Assorted enumerations implicated in public API.
# Each *_map reflects the current values of its enums in qlocale.h; if
# new xml language files are available in CLDR, these languages and
# territories need to be *appended* to this list (for compatibility
# between versions). Include any spaces and dashes present in names
# (they'll be squished them out for the enum entries) in *_map, but
# use the squished forms of names in the *_aliases mappings. The
# squishing also turns the first letter of each word into a capital so
# you can safely preserve the case of en.xml's name; but omit (or
# replace with space) any punctuation aside from dashes and map any
# accented letters to their un-accented plain ASCII.
The numberings of these enumerations can only change at major
versions. When new CLDR data implies adding entries, the new ones must
go after all existing ones. See also zonedata.py for enumerations
related to timezones and CLDR, which can more freely be changed
between versions.
# For a new major version (and only then), we can change the
# numbering, so re-sort each list into alphabetic order (e.g. using
# sort -k2); but keep the Any and C entries first. That's why those
# are offset with a blank line, below. After doing that, regenerate
# locale data as usual; this will cause a binary-incompatible change.
A run of cldr2qlocalexml.py will produce output reporting any
language, script and territory codes it sees, in data, for which it
can find a name (taken always from en.xml) that could potentially be
used. There is no point adding a mapping for such a code unless the
CLDR's common/main/ contains an XML file for at least one locale that
exercises it (and little point, even then, absent substantial data,
ignoring draft='unconfirmed' entries).
# Note on "macrolanguage" comments: see QTBUG-107781 and "ISO 639
# macrolanguage" on Wikipedia. A "macrolanguage" is (loosely-speaking)
# a group of languages so closely related to one another that they
# could also be regarded as divergent dialects of the macrolanguage.
Each *_map reflects the current values of its enums in qlocale.h; if
new xml language files are available in CLDR, these languages and
territories need to be *appended* to this list (for compatibility
between versions). Include any spaces and dashes present in names
(they'll be squished out for the enum entries) in *_map, but use the
squished forms of names in the *_aliases mappings. The squishing also
turns the first letter of each word into a capital so you can safely
preserve the case of en.xml's name; but omit (or replace with space)
any punctuation aside from dashes and map any accented letters to
their un-accented plain ASCII. The two tables, for each enum, have
the forms:
* map { Numeric value: ("Proper name", "ISO code") }
* alias { "OldName": "CurrentName" }
TODO: add support for marking entries as deprecated from a specified
version. For aliases that merely deprecates the name. Where we have a
name for which CLDR offers no data, we may also want to deprecate
entries in the map - although they may be worth keeping for the
benefit of QLocaleSelector (see QTBUG-112765), if other
locale-specific resources might have use of them.
For a new major version (and only then), we can change the numbering,
so re-sort each list into alphabetic order (e.g. using sort -k2); but
keep the Any and C entries first. That's why those are offset with a
blank line, below. After doing that, regenerate locale data as usual;
this will cause a binary-incompatible change.
Note on 'macrolanguage' comments: see QTBUG-107781 and 'ISO 639
macrolanguage' on Wikipedia. A 'macrolanguage' is (loosely-speaking) a
group of languages so closely related to one another that they could
also be regarded as divergent dialects of the macrolanguage. In some
cases this may mean a resource (such as translation or text-to-speech
data) may describe itself as pertaining to the macrolanguage, implying
its suitability for use in any of the languages within the
macrolanguage. For example, no_NO might be used for a generic
Norwegian resource, embracing both nb_NO and nn_NO.
"""
language_map = {
0: ("AnyLanguage", " "),
@ -530,9 +556,9 @@ territory_map = {
115: ("Isle of Man", "IM"),
116: ("Israel", "IL"),
117: ("Italy", "IT"),
# Officially Côte dIvoire, which we'd ned to map to CotedIvoire
# or CoteDIvoire, either failing to make the d' separate from
# Cote or messing with its case. So stick with Ivory Coast:
# Officially Côte dIvoire, which we'd need to map to CotedIvoire
# or CoteDIvoire, either failing to make the d' separate from Cote
# or messing with its case. So stick with Ivory Coast:
118: ("Ivory Coast", "CI"),
119: ("Jamaica", "JM"),
120: ("Japan", "JP"),