Revise Windows time-zone mapping to use proper IANA IDs

The CLDR's "IANA" IDs may (for the sake of stability) date back to
before IANA's own naming has been updated. As a result, the "IANA" IDs
we were using were in some cases out of date. CLDR does provide a
mapping from its stable IDs to all aliases and the current IANA name
for each (which I shall soon be needing in other work), so use that to
map the CLDR IDs to contemporary IANA ones.

Revise the documentation of CldrAccess.readWindowsTimeZones() to take
this into account, pass it the alias mapping from the table, use that
to map IDs internally and, in passing, rename a variable.  Update
cldr2qtimezone.py to match the new CldrAccess methods and regenerate
the data.

Change-Id: I23d8a7d048d76392099d125376b544a41faf7eb3
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Mate Barany <mate.barany@qt.io>
This commit is contained in:
Edward Welbourne 2024-02-01 15:13:02 +01:00
parent e2e5ab0932
commit 99475db542
3 changed files with 832 additions and 813 deletions

File diff suppressed because it is too large Load Diff

View File

@ -458,29 +458,40 @@ enumdata.py (keeping the old name as an alias):
return alias, naming return alias, naming
def readWindowsTimeZones(self, lookup): # For use by cldr2qtimezone.py def readWindowsTimeZones(self, lookup, alias): # For use by cldr2qtimezone.py
"""Digest CLDR's MS-Win time-zone name mapping. """Digest CLDR's MS-Win time-zone name mapping.
MS-Win have their own eccentric names for time-zones. CLDR MS-Win have their own eccentric names for time-zones. CLDR
helpfully provides a translation to more orthodox names. helpfully provides a translation to more orthodox names,
albeit these are CLDR IDs - see bcp47Aliases() - rather than
(up to date) IANA IDs. The windowsZones.xml supplement has
supplementalData/windowsZones/mapTimezones/mapZone nodes with
attributes
Single argument, lookup, is a mapping from known MS-Win names territory -- using 001 (World) for 'default'
for locales to a unique integer index (starting at 1). type -- space-joined sequence of CLDR IDs of zones
other -- Windows name of these zones in the given territory
The XML structure we read has the form: First argument, lookup, is a mapping from known MS-Win names
for timezones to a unique integer index (starting at 1). Second
argument, alias, should be the first part of the pair returned
by a call to bcp47Aliases(); it shall be used to transform
CLDR IDs into IANA IDs.
<supplementalData> For each mapZone node, its territory is mapped to a
<windowsZones> QLocale::Territory enum with numeric value code e, its other
<mapTimezones otherVersion="..." typeVersion="..."> is mapped through lookup to obtain an MS-Win name index k and
<!-- (UTC-08:00) Pacific Time (US & Canada) --> its type is split on spacing and cleaned up as follows. Each
<mapZone other="Pacific Standard Time" territory="001" type="America/Los_Angeles"/> entry in type is mapped, via alias (if present in it) to get a
<mapZone other="Pacific Standard Time" territory="CA" type="America/Vancouver America/Dawson America/Whitehorse"/> list of IANA IDs, omitting any later duplicates from earlier
<mapZone other="Pacific Standard Time" territory="US" type="America/Los_Angeles America/Metlakatla"/> entries; the result list of IANA IDs is joined with spaces
<mapZone other="Pacific Standard Time" territory="ZZ" type="PST8PDT"/> between to give a string s.
</mapTimezones>
</windowsZones> Returns a triple (version, defaults, windows) in which version
</supplementalData> is the version of CLDR in use, defaults is a mapping {k: s}
""" and windows is a mapping {(k, e): b} in which b maps
'windowsId' to the Windows name of the zone (the node's other
attribute), 'territoryCode' to e and 'ianaList' to s."""
zones = self.supplement('windowsZones.xml') zones = self.supplement('windowsZones.xml')
enum = self.__enumMap('territory') enum = self.__enumMap('territory')
badZones, unLands, defaults, windows = set(), set(), {}, {} badZones, unLands, defaults, windows = set(), set(), {}, {}
@ -490,9 +501,17 @@ enumdata.py (keeping the old name as an alias):
continue continue
wid, code = attrs['other'], attrs['territory'] wid, code = attrs['other'], attrs['territory']
cldrs, ianas = attrs['type'].split(), []
for cldr in cldrs:
if cldr in alias:
iana = alias[cldr]
if iana not in ianas:
ianas.append(iana)
else:
ianas.append(cldr)
data = dict(windowsId = wid, data = dict(windowsId = wid,
territoryCode = code, territoryCode = code,
ianaList = ' '.join(attrs['type'].split())) ianaList = ' '.join(ianas))
try: try:
key = lookup[wid] key = lookup[wid]
@ -505,12 +524,12 @@ enumdata.py (keeping the old name as an alias):
defaults[key] = data['ianaList'] defaults[key] = data['ianaList']
else: else:
try: try:
cid, name = enum[code] land, name = enum[code]
except KeyError: except KeyError:
unLands.append(code) unLands.append(code)
continue continue
data.update(territoryId = cid, territory = name) data.update(territoryId = land, territory = name)
windows[key, cid] = data windows[key, land] = data
if unLands: if unLands:
raise Error('Unknown territory codes, please add to enumdata.py: ' raise Error('Unknown territory codes, please add to enumdata.py: '

View File

@ -173,7 +173,8 @@ def main(out, err):
try: try:
version, defaults, winIds = access.readWindowsTimeZones( version, defaults, winIds = access.readWindowsTimeZones(
{name: ind for ind, name in enumerate((x[0] for x in windowsIdList), 1)}) {name: ind for ind, name in enumerate((k for k, v in windowsIdList), 1)},
alias)
except IOError as e: except IOError as e:
parser.error( parser.error(
f'Failed to open common/supplemental/windowsZones.xml: {e}') f'Failed to open common/supplemental/windowsZones.xml: {e}')