tools: support full-icu by default

Instead of an English-only icudt64l.dat in the repo,
we now have icudt64l.dat.gz with all locales.

- updated READMEs and docs
- shrinker now copies source, and compresses (bzip2) the ICU data file
- configure expects deps/icu-small to be full ICU with a full
compressed data file

Fixes: https://github.com/nodejs/node/issues/19214
Co-Authored-By: Richard Lau <riclau@uk.ibm.com>
Co-Authored-By: Jan Olaf Krems <jan.krems@gmail.com>
Co-Authored-By: James M Snell <jasnell@gmail.com>
PR-URL: https://github.com/nodejs/node/pull/29522

Reviewed-By: Jan Krems <jan.krems@gmail.com>
Reviewed-By: Jiawen Geng <technicalcute@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
This commit is contained in:
Steven R. Loomis 2019-09-30 11:17:49 -07:00 committed by Rich Trott
parent a71fb978a4
commit 1a25e901b7
11 changed files with 195 additions and 136 deletions

View File

@ -35,21 +35,23 @@ file a new issue.
* [Building Node.js](#building-nodejs-1) * [Building Node.js](#building-nodejs-1)
* [Android/Android-based devices (e.g. Firefox OS)](#androidandroid-based-devices-eg-firefox-os) * [Android/Android-based devices (e.g. Firefox OS)](#androidandroid-based-devices-eg-firefox-os)
* [`Intl` (ECMA-402) support](#intl-ecma-402-support) * [`Intl` (ECMA-402) support](#intl-ecma-402-support)
* [Default: `small-icu` (English only) support](#default-small-icu-english-only-support)
* [Build with full ICU support (all locales supported by ICU)](#build-with-full-icu-support-all-locales-supported-by-icu) * [Build with full ICU support (all locales supported by ICU)](#build-with-full-icu-support-all-locales-supported-by-icu)
* [Unix/macOS](#unixmacos) * [Unix/macOS](#unixmacos)
* [Windows](#windows-1) * [Windows](#windows-1)
* [Building without Intl support](#building-without-intl-support) * [Trimmed: `small-icu` (English only) support](#trimmed-small-icu-english-only-support)
* [Unix/macOS](#unixmacos-1) * [Unix/macOS](#unixmacos-1)
* [Windows](#windows-2) * [Windows](#windows-2)
* [Use existing installed ICU (Unix/macOS only)](#use-existing-installed-icu-unixmacOS-only) * [Building without Intl support](#building-without-intl-support)
* [Build with a specific ICU](#build-with-a-specific-icu)
* [Unix/macOS](#unixmacos-2) * [Unix/macOS](#unixmacos-2)
* [Windows](#windows-3) * [Windows](#windows-3)
* [Building Node.js with FIPS-compliant OpenSSL](#building-nodejs-with-fips-compliant-openssl) * [Use existing installed ICU (Unix/macOS only)](#use-existing-installed-icu-unixmacOS-only)
* [Building Node.js with external core modules](#building-nodejs-with-external-core-modules) * [Build with a specific ICU](#build-with-a-specific-icu)
* [Unix/macOS](#unixmacos-3) * [Unix/macOS](#unixmacos-3)
* [Windows](#windows-4) * [Windows](#windows-4)
* [Building Node.js with FIPS-compliant OpenSSL](#building-nodejs-with-fips-compliant-openssl)
* [Building Node.js with external core modules](#building-nodejs-with-external-core-modules)
* [Unix/macOS](#unixmacos-4)
* [Windows](#windows-5)
* [Note for downstream distributors of Node.js](#note-for-downstream-distributors-of-nodejs) * [Note for downstream distributors of Node.js](#note-for-downstream-distributors-of-nodejs)
## Supported platforms ## Supported platforms
@ -598,31 +600,40 @@ $ make
## `Intl` (ECMA-402) support ## `Intl` (ECMA-402) support
[Intl](https://github.com/nodejs/node/blob/master/doc/api/intl.md) support is [Intl](https://github.com/nodejs/node/blob/master/doc/api/intl.md) support is
enabled by default, with English data only. enabled by default.
### Default: `small-icu` (English only) support
By default, only English data is included, but
the full `Intl` (ECMA-402) APIs. It does not need to download
any dependencies to function. You can add full
data at runtime.
### Build with full ICU support (all locales supported by ICU) ### Build with full ICU support (all locales supported by ICU)
With the `--download=all`, this may download ICU if you don't have an This is the default option.
ICU in `deps/icu`. (The embedded `small-icu` included in the default
Node.js source does not include all locales.)
#### Unix/macOS #### Unix/macOS
```console ```console
$ ./configure --with-intl=full-icu --download=all $ ./configure --with-intl=full-icu
``` ```
#### Windows #### Windows
```console ```console
> .\vcbuild full-icu download-all > .\vcbuild full-icu
```
### Trimmed: `small-icu` (English only) support
In this configuration, only English data is included, but
the full `Intl` (ECMA-402) APIs. It does not need to download
any dependencies to function. You can add full data at runtime.
#### Unix/macOS
```console
$ ./configure --with-intl=small-icu
```
#### Windows
```console
> .\vcbuild small-icu
``` ```
### Building without Intl support ### Building without Intl support

View File

@ -11,6 +11,8 @@ import re
import shlex import shlex
import subprocess import subprocess
import shutil import shutil
import bz2
from distutils.spawn import find_executable as which from distutils.spawn import find_executable as which
# If not run from node/, cd to node/. # If not run from node/, cd to node/.
@ -409,7 +411,7 @@ parser.add_option('--use-largepages-script-lld',
intl_optgroup.add_option('--with-intl', intl_optgroup.add_option('--with-intl',
action='store', action='store',
dest='with_intl', dest='with_intl',
default='small-icu', default='full-icu',
choices=valid_intl_modes, choices=valid_intl_modes,
help='Intl mode (valid choices: {0}) [default: %default]'.format( help='Intl mode (valid choices: {0}) [default: %default]'.format(
', '.join(valid_intl_modes))) ', '.join(valid_intl_modes)))
@ -1399,7 +1401,8 @@ def configure_intl(o):
icu_parent_path = 'deps' icu_parent_path = 'deps'
# The full path to the ICU source directory. Should not include './'. # The full path to the ICU source directory. Should not include './'.
icu_full_path = 'deps/icu' icu_deps_path = 'deps/icu'
icu_full_path = icu_deps_path
# icu-tmp is used to download and unpack the ICU tarball. # icu-tmp is used to download and unpack the ICU tarball.
icu_tmp_path = os.path.join(icu_parent_path, 'icu-tmp') icu_tmp_path = os.path.join(icu_parent_path, 'icu-tmp')
@ -1407,30 +1410,26 @@ def configure_intl(o):
# canned ICU. see tools/icu/README.md to update. # canned ICU. see tools/icu/README.md to update.
canned_icu_dir = 'deps/icu-small' canned_icu_dir = 'deps/icu-small'
# use the README to verify what the canned ICU is
canned_is_full = os.path.isfile(os.path.join(canned_icu_dir, 'README-FULL-ICU.txt'))
canned_is_small = os.path.isfile(os.path.join(canned_icu_dir, 'README-SMALL-ICU.txt'))
if canned_is_small:
warn('Ignoring %s - in-repo small icu is no longer supported.' % canned_icu_dir)
# We can use 'deps/icu-small' - pre-canned ICU *iff* # We can use 'deps/icu-small' - pre-canned ICU *iff*
# - with_intl == small-icu (the default!) # - canned_is_full AND
# - with_icu_locales == 'root,en' (the default!)
# - deps/icu-small exists!
# - with_icu_source is unset (i.e. no other ICU was specified) # - with_icu_source is unset (i.e. no other ICU was specified)
# (Note that this is the *DEFAULT CASE*.)
# #
# This is *roughly* equivalent to # This is *roughly* equivalent to
# $ configure --with-intl=small-icu --with-icu-source=deps/icu-small # $ configure --with-intl=full-icu --with-icu-source=deps/icu-small
# .. Except that we avoid copying icu-small over to deps/icu. # .. Except that we avoid copying icu-small over to deps/icu.
# In this default case, deps/icu is ignored, although make clean will # In this default case, deps/icu is ignored, although make clean will
# still harmlessly remove deps/icu. # still harmlessly remove deps/icu.
# are we using default locales? if (not with_icu_source) and canned_is_full:
using_default_locales = ( options.with_icu_locales == icu_default_locales )
# make sure the canned ICU really exists
canned_icu_available = os.path.isdir(canned_icu_dir)
if (o['variables']['icu_small'] == b(True)) and using_default_locales and (not with_icu_source) and canned_icu_available:
# OK- we can use the canned ICU. # OK- we can use the canned ICU.
icu_config['variables']['icu_small_canned'] = 1
icu_full_path = canned_icu_dir icu_full_path = canned_icu_dir
icu_config['variables']['icu_full_canned'] = 1
# --with-icu-source processing # --with-icu-source processing
# now, check that they didn't pass --with-icu-source=deps/icu # now, check that they didn't pass --with-icu-source=deps/icu
elif with_icu_source and os.path.abspath(icu_full_path) == os.path.abspath(with_icu_source): elif with_icu_source and os.path.abspath(icu_full_path) == os.path.abspath(with_icu_source):
@ -1508,29 +1507,40 @@ def configure_intl(o):
icu_endianness = sys.byteorder[0] icu_endianness = sys.byteorder[0]
o['variables']['icu_ver_major'] = icu_ver_major o['variables']['icu_ver_major'] = icu_ver_major
o['variables']['icu_endianness'] = icu_endianness o['variables']['icu_endianness'] = icu_endianness
icu_data_file_l = 'icudt%s%s.dat' % (icu_ver_major, 'l') icu_data_file_l = 'icudt%s%s.dat' % (icu_ver_major, 'l') # LE filename
icu_data_file = 'icudt%s%s.dat' % (icu_ver_major, icu_endianness) icu_data_file = 'icudt%s%s.dat' % (icu_ver_major, icu_endianness)
# relative to configure # relative to configure
icu_data_path = os.path.join(icu_full_path, icu_data_path = os.path.join(icu_full_path,
'source/data/in', 'source/data/in',
icu_data_file_l) icu_data_file_l) # LE
compressed_data = '%s.bz2' % (icu_data_path)
if not os.path.isfile(icu_data_path) and os.path.isfile(compressed_data):
# unpack. deps/icu is a temporary path
if os.path.isdir(icu_tmp_path):
shutil.rmtree(icu_tmp_path)
os.mkdir(icu_tmp_path)
icu_data_path = os.path.join(icu_tmp_path, icu_data_file_l)
with open(icu_data_path, 'wb') as outf:
with bz2.BZ2File(compressed_data, 'rb') as inf:
shutil.copyfileobj(inf, outf)
# Now, proceed..
# relative to dep.. # relative to dep..
icu_data_in = os.path.join('..','..', icu_full_path, 'source/data/in', icu_data_file_l) icu_data_in = os.path.join('..','..', icu_data_path)
if not os.path.isfile(icu_data_path) and icu_endianness != 'l': if not os.path.isfile(icu_data_path) and icu_endianness != 'l':
# use host endianness # use host endianness
icu_data_path = os.path.join(icu_full_path, icu_data_path = os.path.join(icu_full_path,
'source/data/in', 'source/data/in',
icu_data_file) icu_data_file) # will be generated
# relative to dep..
icu_data_in = os.path.join('..', icu_full_path, 'source/data/in',
icu_data_file)
# this is the input '.dat' file to use .. icudt*.dat
# may be little-endian if from a icu-project.org tarball
o['variables']['icu_data_in'] = icu_data_in
if not os.path.isfile(icu_data_path): if not os.path.isfile(icu_data_path):
# .. and we're not about to build it from .gyp! # .. and we're not about to build it from .gyp!
error('''ICU prebuilt data file %s does not exist. error('''ICU prebuilt data file %s does not exist.
See the README.md.''' % icu_data_path) See the README.md.''' % icu_data_path)
# this is the input '.dat' file to use .. icudt*.dat
# may be little-endian if from a icu-project.org tarball
o['variables']['icu_data_in'] = icu_data_in
# map from variable name to subdirs # map from variable name to subdirs
icu_src = { icu_src = {
'stubdata': 'stubdata', 'stubdata': 'stubdata',
@ -1547,6 +1557,31 @@ def configure_intl(o):
var = 'icu_src_%s' % i var = 'icu_src_%s' % i
path = '../../%s/source/%s' % (icu_full_path, icu_src[i]) path = '../../%s/source/%s' % (icu_full_path, icu_src[i])
icu_config['variables'][var] = glob_to_var('tools/icu', path, 'patches/%s/source/%s' % (icu_ver_major, icu_src[i]) ) icu_config['variables'][var] = glob_to_var('tools/icu', path, 'patches/%s/source/%s' % (icu_ver_major, icu_src[i]) )
# calculate platform-specific genccode args
# print("platform %s, flavor %s" % (sys.platform, flavor))
# if sys.platform == 'darwin':
# shlib_suffix = '%s.dylib'
# elif sys.platform.startswith('aix'):
# shlib_suffix = '%s.a'
# else:
# shlib_suffix = 'so.%s'
if flavor == 'win':
icu_config['variables']['icu_asm_ext'] = 'obj'
icu_config['variables']['icu_asm_opts'] = [ '-o ' ]
elif with_intl == 'small-icu' or options.cross_compiling:
icu_config['variables']['icu_asm_ext'] = 'c'
icu_config['variables']['icu_asm_opts'] = []
elif flavor == 'mac':
icu_config['variables']['icu_asm_ext'] = 'S'
icu_config['variables']['icu_asm_opts'] = [ '-a', 'gcc-darwin' ]
elif sys.platform.startswith('aix'):
icu_config['variables']['icu_asm_ext'] = 'S'
icu_config['variables']['icu_asm_opts'] = [ '-a', 'xlc' ]
else:
# assume GCC-compatible asm is OK
icu_config['variables']['icu_asm_ext'] = 'S'
icu_config['variables']['icu_asm_opts'] = [ '-a', 'gcc' ]
# write updated icu_config.gypi with a bunch of paths # write updated icu_config.gypi with a bunch of paths
write(icu_config_name, do_not_edit + write(icu_config_name, do_not_edit +
pprint.pformat(icu_config, indent=2) + '\n') pprint.pformat(icu_config, indent=2) + '\n')

8
deps/icu-small/README-FULL-ICU.txt vendored Normal file
View File

@ -0,0 +1,8 @@
ICU sources - auto generated by shrink-icu-src.py
This directory contains the ICU subset used by --with-intl=full-icu
It is a strict subset of ICU 64 source files with the following exception(s):
* deps/icu-small/source/data/in/icudt64l.dat.bz2 : compressed data file
To rebuild this directory, see ../../tools/icu/README.md

View File

@ -1,8 +0,0 @@
Small ICU sources - auto generated by shrink-icu-src.py
This directory contains the ICU subset used by --with-intl=small-icu (the default)
It is a strict subset of ICU 64 source files with the following exception(s):
* deps/icu-small/source/data/in/icudt64l.dat : Reduced-size data file
To rebuild this directory, see ../../tools/icu/README.md

Binary file not shown.

Binary file not shown.

View File

@ -23,11 +23,9 @@ programs. Some of them are:
* [`RegExp` Unicode Property Escapes][] * [`RegExp` Unicode Property Escapes][]
Node.js (and its underlying V8 engine) uses [ICU][] to implement these features Node.js (and its underlying V8 engine) uses [ICU][] to implement these features
in native C/C++ code. However, some of them require a very large ICU data file in native C/C++ code. The full ICU data set is provided by Node.js by default.
in order to support all locales of the world. Because it is expected that most However, due to the size of the ICU data file, several
Node.js users will make use of only a small portion of ICU functionality, only options are provided for customizing the ICU data set either when
a subset of the full ICU data set is provided by Node.js by default. Several
options are provided for customizing and expanding the ICU data set either when
building or running Node.js. building or running Node.js.
## Options for building Node.js ## Options for building Node.js
@ -38,8 +36,8 @@ in [BUILDING.md][].
* `--with-intl=none`/`--without-intl` * `--with-intl=none`/`--without-intl`
* `--with-intl=system-icu` * `--with-intl=system-icu`
* `--with-intl=small-icu` (default) * `--with-intl=small-icu`
* `--with-intl=full-icu` * `--with-intl=full-icu` (default)
An overview of available Node.js and JavaScript features for each `configure` An overview of available Node.js and JavaScript features for each `configure`
option: option:
@ -66,8 +64,8 @@ operation is identical to that of `Date.prototype.toString()`.
### Disable all internationalization features (`none`) ### Disable all internationalization features (`none`)
If this option is chosen, most internationalization features mentioned above If this option is chosen, ICU is disabled and most internationalization
will be **unavailable** in the resulting `node` binary. features mentioned above will be **unavailable** in the resulting `node` binary.
### Build with a pre-installed ICU (`system-icu`) ### Build with a pre-installed ICU (`system-icu`)
@ -106,9 +104,7 @@ console.log(spanish.format(january));
// Should print "enero" // Should print "enero"
``` ```
This mode provides a good balance between features and binary size, and it is This mode provides a balance between features and binary size.
the default behavior if no `--with-intl` flag is passed. The official binaries
are also built in this mode.
#### Providing ICU data at runtime #### Providing ICU data at runtime
@ -149,8 +145,9 @@ enable full `Intl` support.
This option makes the resulting binary link against ICU statically and include This option makes the resulting binary link against ICU statically and include
a full set of ICU data. A binary created this way has no further external a full set of ICU data. A binary created this way has no further external
dependencies and supports all locales, but might be rather large. See dependencies and supports all locales, but might be rather large. This is
[BUILDING.md][BUILDING.md#full-icu] on how to compile a binary using this mode. the default behavior if no `--with-intl` flag is passed. The official binaries
are also built in this mode.
## Detecting internationalization support ## Detecting internationalization support
@ -205,7 +202,6 @@ to be helpful:
[`String.prototype.toUpperCase()`]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/toUpperCase [`String.prototype.toUpperCase()`]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/toUpperCase
[`require('buffer').transcode()`]: buffer.html#buffer_buffer_transcode_source_fromenc_toenc [`require('buffer').transcode()`]: buffer.html#buffer_buffer_transcode_source_fromenc_toenc
[`require('util').TextDecoder`]: util.html#util_class_util_textdecoder [`require('util').TextDecoder`]: util.html#util_class_util_textdecoder
[BUILDING.md#full-icu]: https://github.com/nodejs/node/blob/master/BUILDING.md#build-with-full-icu-support-all-locales-supported-by-icu
[BUILDING.md]: https://github.com/nodejs/node/blob/master/BUILDING.md [BUILDING.md]: https://github.com/nodejs/node/blob/master/BUILDING.md
[ECMA-262]: https://tc39.github.io/ecma262/ [ECMA-262]: https://tc39.github.io/ecma262/
[ECMA-402]: https://tc39.github.io/ecma402/ [ECMA-402]: https://tc39.github.io/ecma402/

View File

@ -932,26 +932,9 @@ Per the [WHATWG Encoding Standard][], the encodings supported by the
one or more aliases may be used. one or more aliases may be used.
Different Node.js build configurations support different sets of encodings. Different Node.js build configurations support different sets of encodings.
While a very basic set of encodings is supported even on Node.js builds without (see [Internationalization][])
ICU enabled, support for some encodings is provided only when Node.js is built
with ICU and using the full ICU data (see [Internationalization][]).
#### Encodings Supported Without ICU #### Encodings Supported by Default (With Full ICU Data)
| Encoding | Aliases |
| ----------- | --------------------------------- |
| `'utf-8'` | `'unicode-1-1-utf-8'`, `'utf8'` |
| `'utf-16le'` | `'utf-16'` |
#### Encodings Supported by Default (With ICU)
| Encoding | Aliases |
| ----------- | --------------------------------- |
| `'utf-8'` | `'unicode-1-1-utf-8'`, `'utf8'` |
| `'utf-16le'` | `'utf-16'` |
| `'utf-16be'` | |
#### Encodings Requiring Full ICU Data
| Encoding | Aliases | | Encoding | Aliases |
| ----------------- | -------------------------------- | | ----------------- | -------------------------------- |
@ -990,6 +973,21 @@ with ICU and using the full ICU data (see [Internationalization][]).
| `'shift_jis'` | `'csshiftjis'`, `'ms932'`, `'ms_kanji'`, `'shift-jis'`, `'sjis'`, `'windows-31j'`, `'x-sjis'` | | `'shift_jis'` | `'csshiftjis'`, `'ms932'`, `'ms_kanji'`, `'shift-jis'`, `'sjis'`, `'windows-31j'`, `'x-sjis'` |
| `'euc-kr'` | `'cseuckr'`, `'csksc56011987'`, `'iso-ir-149'`, `'korean'`, `'ks_c_5601-1987'`, `'ks_c_5601-1989'`, `'ksc5601'`, `'ksc_5601'`, `'windows-949'` | | `'euc-kr'` | `'cseuckr'`, `'csksc56011987'`, `'iso-ir-149'`, `'korean'`, `'ks_c_5601-1987'`, `'ks_c_5601-1989'`, `'ksc5601'`, `'ksc_5601'`, `'windows-949'` |
#### Encodings Supported when Node.js is built with the `small-icu` option
| Encoding | Aliases |
| ----------- | --------------------------------- |
| `'utf-8'` | `'unicode-1-1-utf-8'`, `'utf8'` |
| `'utf-16le'` | `'utf-16'` |
| `'utf-16be'` | |
#### Encodings Supported when ICU is disabled
| Encoding | Aliases |
| ----------- | --------------------------------- |
| `'utf-8'` | `'unicode-1-1-utf-8'`, `'utf8'` |
| `'utf-16le'` | `'utf-16'` |
The `'iso-8859-16'` encoding listed in the [WHATWG Encoding Standard][] The `'iso-8859-16'` encoding listed in the [WHATWG Encoding Standard][]
is not supported. is not supported.
@ -1005,9 +1003,9 @@ changes:
* `encoding` {string} Identifies the `encoding` that this `TextDecoder` instance * `encoding` {string} Identifies the `encoding` that this `TextDecoder` instance
supports. **Default:** `'utf-8'`. supports. **Default:** `'utf-8'`.
* `options` {Object} * `options` {Object}
* `fatal` {boolean} `true` if decoding failures are fatal. This option is only * `fatal` {boolean} `true` if decoding failures are fatal.
supported when ICU is enabled (see [Internationalization][]). **Default:** This option is not supported when ICU is disabled
`false`. (see [Internationalization][]). **Default:** `false`.
* `ignoreBOM` {boolean} When `true`, the `TextDecoder` will include the byte * `ignoreBOM` {boolean} When `true`, the `TextDecoder` will include the byte
order mark in the decoded result. When `false`, the byte order mark will order mark in the decoded result. When `false`, the byte order mark will
be removed from the output. This option is only used when `encoding` is be removed from the output. This option is only used when `encoding` is

View File

@ -27,7 +27,7 @@ internationalization functionality.
```shell ```shell
./configure \ ./configure \
--with-intl=small-icu \ --with-intl=full-icu \
--with-icu-source=http://download.icu-project.org/files/icu4c/58.1/icu4c-58_1-src.tgz --with-icu-source=http://download.icu-project.org/files/icu4c/58.1/icu4c-58_1-src.tgz
make make
``` ```
@ -54,7 +54,7 @@ Also running
new Intl.DateTimeFormat('es', {month: 'long'}).format(new Date(9E8)); new Intl.DateTimeFormat('es', {month: 'long'}).format(new Date(9E8));
``` ```
…Should return `January` not `enero`. …Should return `enero` not `January`.
* Now, copy `deps/icu` over to `deps/icu-small` * Now, copy `deps/icu` over to `deps/icu-small`
@ -94,12 +94,12 @@ tools/license-builder.sh
* Update the URL and hash for the full ICU file in `tools/icu/current_ver.dep`. * Update the URL and hash for the full ICU file in `tools/icu/current_ver.dep`.
It should match the ICU URL used in the first step. When this is done, the It should match the ICU URL used in the first step. When this is done, the
following should build with full ICU. following should build with small ICU.
```shell ```shell
# clean up # clean up
rm -rf out deps/icu deps/icu4c* rm -rf out deps/icu deps/icu4c*
./configure --with-intl=full-icu --download=all ./configure --with-intl=small-icu --download=all
make make
make test-ci make test-ci
``` ```

View File

@ -212,16 +212,17 @@
'conditions': [ 'conditions': [
[ 'icu_small == "false"', { # and OS=win [ 'icu_small == "false"', { # and OS=win
# full data - just build the full data file, then we are done. # full data - just build the full data file, then we are done.
'sources': [ '<(SHARED_INTERMEDIATE_DIR)/icudt<(icu_ver_major)<(icu_endianness)_dat.obj' ], 'sources': [ '<(SHARED_INTERMEDIATE_DIR)/icudt<(icu_ver_major)<(icu_endianness)_dat.<(icu_asm_ext)' ],
'dependencies': [ 'genccode#host' ], 'dependencies': [ 'genccode#host' ],
'actions': [ 'actions': [
{ {
'action_name': 'icudata', 'action_name': 'icudata',
'msvs_quote_cmd': 0, 'msvs_quote_cmd': 0,
'inputs': [ '<(icu_data_in)' ], 'inputs': [ '<(icu_data_in)' ],
'outputs': [ '<(SHARED_INTERMEDIATE_DIR)/icudt<(icu_ver_major)<(icu_endianness)_dat.obj' ], 'outputs': [ '<(SHARED_INTERMEDIATE_DIR)/icudt<(icu_ver_major)<(icu_endianness)_dat.<(icu_asm_ext)' ],
# on Windows, we can go directly to .obj file (-o) option.
'action': [ '<(PRODUCT_DIR)/genccode', 'action': [ '<(PRODUCT_DIR)/genccode',
'-o', '<@(icu_asm_opts)', # -o
'-d', '<(SHARED_INTERMEDIATE_DIR)', '-d', '<(SHARED_INTERMEDIATE_DIR)',
'-n', 'icudata', '-n', 'icudata',
'-e', 'icudt<(icu_ver_major)', '-e', 'icudt<(icu_ver_major)',
@ -256,9 +257,9 @@
'action_name': 'genccode', 'action_name': 'genccode',
'msvs_quote_cmd': 0, 'msvs_quote_cmd': 0,
'inputs': [ '<(SHARED_INTERMEDIATE_DIR)/icutmp/icudt<(icu_ver_major)<(icu_endianness).dat' ], 'inputs': [ '<(SHARED_INTERMEDIATE_DIR)/icutmp/icudt<(icu_ver_major)<(icu_endianness).dat' ],
'outputs': [ '<(SHARED_INTERMEDIATE_DIR)/icudt<(icu_ver_major)<(icu_endianness)_dat.obj' ], 'outputs': [ '<(SHARED_INTERMEDIATE_DIR)/icudt<(icu_ver_major)<(icu_endianness)_dat.<(icu_asm_ext)' ],
'action': [ '<(PRODUCT_DIR)/genccode', 'action': [ '<(PRODUCT_DIR)/genccode',
'-o', '<@(icu_asm_opts)', # -o
'-d', '<(SHARED_INTERMEDIATE_DIR)/', '-d', '<(SHARED_INTERMEDIATE_DIR)/',
'-n', 'icudata', '-n', 'icudata',
'-e', 'icusmdt<(icu_ver_major)', '-e', 'icusmdt<(icu_ver_major)',
@ -266,20 +267,20 @@
}, },
], ],
# This file contains the small ICU data. # This file contains the small ICU data.
'sources': [ '<(SHARED_INTERMEDIATE_DIR)/icudt<(icu_ver_major)<(icu_endianness)_dat.obj' ], 'sources': [ '<(SHARED_INTERMEDIATE_DIR)/icudt<(icu_ver_major)<(icu_endianness)_dat.<(icu_asm_ext)' ],
} ] ], #end of OS==win and icu_small == true } ] ], #end of OS==win and icu_small == true
}, { # OS != win }, { # OS != win
'conditions': [ 'conditions': [
[ 'icu_small == "false"', { [ 'icu_small == "false"', {
# full data - just build the full data file, then we are done. # full data - no trim needed
'sources': [ '<(SHARED_INTERMEDIATE_DIR)/icudt<(icu_ver_major)_dat.c' ], 'sources': [ '<(SHARED_INTERMEDIATE_DIR)/icudt<(icu_ver_major)_dat.<(icu_asm_ext)' ],
'dependencies': [ 'genccode#host', 'icupkg#host', 'icu_implementation#host', 'icu_uconfig' ], 'dependencies': [ 'genccode#host', 'icupkg#host', 'icu_implementation#host', 'icu_uconfig' ],
'include_dirs': [ 'include_dirs': [
'<(icu_path)/source/common', '<(icu_path)/source/common',
], ],
'actions': [ 'actions': [
{ {
# Swap endianness (if needed), or at least copy the file # Copy the .dat file, swapping endianness if needed.
'action_name': 'icupkg', 'action_name': 'icupkg',
'inputs': [ '<(icu_data_in)' ], 'inputs': [ '<(icu_data_in)' ],
'outputs':[ '<(SHARED_INTERMEDIATE_DIR)/icudt<(icu_ver_major)<(icu_endianness).dat' ], 'outputs':[ '<(SHARED_INTERMEDIATE_DIR)/icudt<(icu_ver_major)<(icu_endianness).dat' ],
@ -290,7 +291,7 @@
], ],
}, },
{ {
# Rename without the endianness marker # Rename without the endianness marker (icudt64l.dat -> icudt64.dat)
'action_name': 'copy', 'action_name': 'copy',
'inputs': [ '<(SHARED_INTERMEDIATE_DIR)/icudt<(icu_ver_major)<(icu_endianness).dat' ], 'inputs': [ '<(SHARED_INTERMEDIATE_DIR)/icudt<(icu_ver_major)<(icu_endianness).dat' ],
'outputs':[ '<(SHARED_INTERMEDIATE_DIR)/icudt<(icu_ver_major).dat' ], 'outputs':[ '<(SHARED_INTERMEDIATE_DIR)/icudt<(icu_ver_major).dat' ],
@ -300,12 +301,14 @@
], ],
}, },
{ {
# convert full ICU data file to .c, or .S, etc.
'action_name': 'icudata', 'action_name': 'icudata',
'inputs': [ '<(SHARED_INTERMEDIATE_DIR)/icudt<(icu_ver_major).dat' ], 'inputs': [ '<(SHARED_INTERMEDIATE_DIR)/icudt<(icu_ver_major).dat' ],
'outputs':[ '<(SHARED_INTERMEDIATE_DIR)/icudt<(icu_ver_major)_dat.c' ], 'outputs':[ '<(SHARED_INTERMEDIATE_DIR)/icudt<(icu_ver_major)_dat.<(icu_asm_ext)' ],
'action': [ '<(PRODUCT_DIR)/genccode', 'action': [ '<(PRODUCT_DIR)/genccode',
'-e', 'icudt<(icu_ver_major)', '-e', 'icudt<(icu_ver_major)',
'-d', '<(SHARED_INTERMEDIATE_DIR)', '-d', '<(SHARED_INTERMEDIATE_DIR)',
'<@(icu_asm_opts)',
'-f', 'icudt<(icu_ver_major)_dat', '-f', 'icudt<(icu_ver_major)_dat',
'<@(_inputs)' ], '<@(_inputs)' ],
}, },
@ -318,7 +321,8 @@
'export_dependent_settings': [ 'icustubdata' ], 'export_dependent_settings': [ 'icustubdata' ],
'actions': [ 'actions': [
{ {
# trim down ICU # Trim down ICU.
# Note that icupkg is invoked automatically, swapping endianness if needed.
'action_name': 'icutrim', 'action_name': 'icutrim',
'inputs': [ '<(icu_data_in)', 'icu_small.json' ], 'inputs': [ '<(icu_data_in)', 'icu_small.json' ],
'outputs': [ '<(SHARED_INTERMEDIATE_DIR)/icutmp/icudt<(icu_ver_major)<(icu_endianness).dat' ], 'outputs': [ '<(SHARED_INTERMEDIATE_DIR)/icutmp/icudt<(icu_ver_major)<(icu_endianness).dat' ],
@ -333,7 +337,7 @@
'-v', '-v',
'-L', '<(icu_locales)'], '-L', '<(icu_locales)'],
}, { }, {
# rename to get the final entrypoint name right # rename to get the final entrypoint name right (icudt64l.dat -> icusmdt64.dat)
'action_name': 'rename', 'action_name': 'rename',
'inputs': [ '<(SHARED_INTERMEDIATE_DIR)/icutmp/icudt<(icu_ver_major)<(icu_endianness).dat' ], 'inputs': [ '<(SHARED_INTERMEDIATE_DIR)/icutmp/icudt<(icu_ver_major)<(icu_endianness).dat' ],
'outputs': [ '<(SHARED_INTERMEDIATE_DIR)/icutmp/icusmdt<(icu_ver_major).dat' ], 'outputs': [ '<(SHARED_INTERMEDIATE_DIR)/icutmp/icusmdt<(icu_ver_major).dat' ],
@ -342,17 +346,18 @@
'<@(_outputs)', '<@(_outputs)',
], ],
}, { }, {
# build final .dat -> .obj # For icu-small, always use .c, don't try to use .S, etc.
'action_name': 'genccode', 'action_name': 'genccode',
'inputs': [ '<(SHARED_INTERMEDIATE_DIR)/icutmp/icusmdt<(icu_ver_major).dat' ], 'inputs': [ '<(SHARED_INTERMEDIATE_DIR)/icutmp/icusmdt<(icu_ver_major).dat' ],
'outputs': [ '<(SHARED_INTERMEDIATE_DIR)/icusmdt<(icu_ver_major)_dat.c' ], 'outputs': [ '<(SHARED_INTERMEDIATE_DIR)/icusmdt<(icu_ver_major)_dat.<(icu_asm_ext)' ],
'action': [ '<(PRODUCT_DIR)/genccode', 'action': [ '<(PRODUCT_DIR)/genccode',
'<@(icu_asm_opts)',
'-d', '<(SHARED_INTERMEDIATE_DIR)', '-d', '<(SHARED_INTERMEDIATE_DIR)',
'<@(_inputs)' ], '<@(_inputs)' ],
}, },
], ],
# This file contains the small ICU data # This file contains the small ICU data
'sources': [ '<(SHARED_INTERMEDIATE_DIR)/icusmdt<(icu_ver_major)_dat.c' ], 'sources': [ '<(SHARED_INTERMEDIATE_DIR)/icusmdt<(icu_ver_major)_dat.<(icu_asm_ext)' ],
# for umachine.h # for umachine.h
'include_dirs': [ 'include_dirs': [
'<(icu_path)/source/common', '<(icu_path)/source/common',

View File

@ -5,14 +5,15 @@ import os
import re import re
import sys import sys
import shutil import shutil
import bz2
parser = optparse.OptionParser() parser = optparse.OptionParser()
parser.add_option('--icu-small', parser.add_option('--icudst',
action='store', action='store',
dest='icusmall', dest='icudst',
default='deps/icu-small', default='deps/icu-small',
help='path to target ICU directory to shrink. Will be deleted.') help='path to target ICU directory. Will be deleted.')
parser.add_option('--icu-src', parser.add_option('--icu-src',
action='store', action='store',
@ -26,18 +27,26 @@ parser.add_option('--icutmp',
default='out/Release/obj/gen/icutmp', default='out/Release/obj/gen/icutmp',
help='path to icutmp dir.') help='path to icutmp dir.')
(options, args) = parser.parse_args() (options, args) = parser.parse_args()
if os.path.isdir(options.icusmall): if os.path.isdir(options.icudst):
print('Deleting existing icusmall %s' % (options.icusmall)) print('Deleting existing icudst %s' % (options.icudst))
shutil.rmtree(options.icusmall) shutil.rmtree(options.icudst)
if not os.path.isdir(options.icusrc): if not os.path.isdir(options.icusrc):
print('Missing source ICU dir --icusrc=%s' % (options.icusrc)) print('Missing source ICU dir --icusrc=%s' % (options.icusrc))
sys.exit(1) sys.exit(1)
# compression stuff. Keep the suffix and the compression function in sync.
compression_suffix = '.bz2'
def compress_data(infp, outfp):
with open(infp, 'rb') as inf:
with bz2.BZ2File(outfp, 'wb') as outf:
shutil.copyfileobj(inf, outf)
def print_size(fn):
size = (os.stat(fn).st_size) / 1024000
print('%dM\t%s' % (size, fn))
ignore_regex = re.compile('^.*\.(vcxproj|filters|nrm|icu|dat|xml|txt|ac|guess|m4|in|sub|py|mak)$') ignore_regex = re.compile('^.*\.(vcxproj|filters|nrm|icu|dat|xml|txt|ac|guess|m4|in|sub|py|mak)$')
@ -90,36 +99,41 @@ def icu_info(icu_full_path):
return (icu_ver_major, icu_endianness) return (icu_ver_major, icu_endianness)
(icu_ver_major, icu_endianness) = icu_info(options.icusrc) (icu_ver_major, icu_endianness) = icu_info(options.icusrc)
print("icudt%s%s" % (icu_ver_major, icu_endianness)) print("Data file root: icudt%s%s" % (icu_ver_major, icu_endianness))
dst_datafile = os.path.join(options.icudst, "source","data","in", "icudt%s%s.dat" % (icu_ver_major, icu_endianness))
src_datafile = os.path.join(options.icutmp, "icusmdt%s.dat" % (icu_ver_major)) src_datafile = os.path.join(options.icusrc, "source/data/in/icudt%sl.dat" % (icu_ver_major))
dst_datafile = os.path.join(options.icusmall, "source","data","in", "icudt%s%s.dat" % (icu_ver_major, icu_endianness)) dst_cmp_datafile = "%s%s" % (dst_datafile, compression_suffix)
if not os.path.isfile(src_datafile): if not os.path.isfile(src_datafile):
print("Could not find source datafile %s - did you build small-icu node?" % src_datafile) print("Error: icu data file not found: %s" % src_datafile)
sys.exit(1) exit(1)
else:
print("will use small datafile %s" % (src_datafile)) print("will use datafile %s" % (src_datafile))
print('%s --> %s' % (options.icusrc, options.icusmall))
shutil.copytree(options.icusrc, options.icusmall, ignore=icu_ignore) print('%s --> %s' % (options.icusrc, options.icudst))
print('%s --> %s' % (src_datafile, dst_datafile)) shutil.copytree(options.icusrc, options.icudst, ignore=icu_ignore)
# now, make the data dir (since we ignored it) # now, make the data dir (since we ignored it)
os.mkdir(os.path.join(os.path.join(options.icusmall, "source", "data"))) icudst_data = os.path.join(options.icudst, "source", "data")
os.mkdir(os.path.join(os.path.join(options.icusmall, "source", "data", "in"))) icudst_in = os.path.join(icudst_data, "in")
os.mkdir(icudst_data)
os.mkdir(icudst_in)
# OK, now copy the data file print_size(src_datafile)
shutil.copy(src_datafile, dst_datafile)
print('%s --compress-> %s' % (src_datafile, dst_cmp_datafile))
compress_data(src_datafile, dst_cmp_datafile)
print_size(dst_cmp_datafile)
readme_name = os.path.join(options.icudst, "README-FULL-ICU.txt" )
# Now, print a short notice # Now, print a short notice
readme_name = os.path.join(options.icusmall, "README-SMALL-ICU.txt" )
fi = open(readme_name, 'wb') fi = open(readme_name, 'wb')
print("Small ICU sources - auto generated by shrink-icu-src.py", file=fi) print("ICU sources - auto generated by shrink-icu-src.py", file=fi)
print("", file=fi) print("", file=fi)
print("This directory contains the ICU subset used by --with-intl=small-icu (the default)", file=fi) print("This directory contains the ICU subset used by --with-intl=full-icu", file=fi)
print("It is a strict subset of ICU %s source files with the following exception(s):" % (icu_ver_major), file=fi) print("It is a strict subset of ICU %s source files with the following exception(s):" % (icu_ver_major), file=fi)
print("* %s : Reduced-size data file" % (dst_datafile), file=fi) print("* %s : compressed data file" % (dst_cmp_datafile), file=fi)
print("", file=fi) print("", file=fi)
print("", file=fi) print("", file=fi)
print("To rebuild this directory, see ../../tools/icu/README.md", file=fi) print("To rebuild this directory, see ../../tools/icu/README.md", file=fi)