from small one page howto to huge articles all in one place
poll results
Last additions:
May 25th. 2007:
April, 26th. 2006:
|
You are here: manpages
GENDICT
Section: ICU 58.2 Manual (1) Updated: 1 June 2012 Index
Return to Main Contents
NAME
gendict
- Compiles word list into ICU string trie dictionary
SYNOPSIS
gendict
[
--uchars
|
--bytes
--transform transform
]
[
-h, -?, --help
]
[
-V, --version
]
[
-c, --copyright
]
[
-v, --verbose
]
[
-i, --icudatadir directory
]
input-file
output-file
DESCRIPTION
gendict
reads the word list from
dictionary-file
and creates a string trie dictionary file. Normally this data file has the
.dict
extension.
Words begin at the beginning of a line and are terminated by the first whitespace.
Lines that begin with whitespace are ignored.
OPTIONS
- -h, -?, --help
-
Print help about usage and exit.
- -V, --version
-
Print the version of
gendict
and exit.
- -c, --copyright
-
Embeds the standard ICU copyright into the
output-file.
- -v, --verbose
-
Display extra informative messages during execution.
- -i, --icudatadir directory
-
Look for any necessary ICU data files in
directory.
For example, the file
pnames.icu
must be located when ICU's data is not built as a shared library.
The default ICU data directory is specified by the environment variable
ICU_DATA.
Most configurations of ICU do not require this argument.
- --uchars
-
Set the output trie type to UChar. Mutually exclusive with
--bytes.
- --bytes
-
Set the output trie type to Bytes. Mutually exclusive with
--uchars.
- --transform
-
Set the transform type. Should only be specified with
--bytes.
Currently supported transforms are:
offset-<hex-number>,
which specifies an offset to subtract from all input characters.
It should be noted that the offset transform also maps U+200D
to 0xFF and U+200C to 0xFE, in order to offer compatibility to
languages that require these characters.
A transform must be specified for a bytes trie, and when applied
to the non-value characters in the
input-file
must produce output between 0x00 and 0xFF.
- input-file
-
The source file to read.
- output-file
-
The file to write the output dictionary to.
CAVEATS
The
input-file
is assumed to be encoded in UTF-8.
The integers in the
input-file
that are used as values must be made up of ASCII digits. They
may be specified either in hex, by using a 0x prefix, or in
decimal.
Either
--bytes
or
--uchars
must be specified.
ENVIRONMENT
- ICU_DATA
-
Specifies the directory containing ICU data. Defaults to
/usr/share/icu/58.2/.
Some tools in ICU depend on the presence of the trailing slash. It is thus
important to make sure that it is present if
ICU_DATA
is set.
AUTHORS
Maxime Serrano
VERSION
1.0
COPYRIGHT
Copyright (C) 2012 International Business Machines Corporation and others
SEE ALSO
http://www.icu-project.org/userguide/boundaryAnalysis.html
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- OPTIONS
-
- CAVEATS
-
- ENVIRONMENT
-
- AUTHORS
-
- VERSION
-
- COPYRIGHT
-
- SEE ALSO
-
|