# GENDICT(1) - man - phpman

[GENDICT(1)](https://www.chedong.com/phpMan.php/man/GENDICT/1/markdown)                                 ICU 70.1 Manual                                [GENDICT(1)](https://www.chedong.com/phpMan.php/man/GENDICT/1/markdown)



## NAME
       **gendict** - Compiles word list into ICU string trie dictionary

## SYNOPSIS
       **gendict**  [  **--uchars** | **--bytes** **--transform** _transform_ ] [ **-h**, **-?**, **--help** ] [ **-V**, **--version** ] [
### -c --copyright -v --verbose -i --icudatadir

## DESCRIPTION
       **gendict** reads the word list from _dictionary-file_ and creates a string trie  dictionary  file.
       Normally this data file has the **.dict** extension.

       Words  begin  at  the  beginning of a line and are terminated by the first whitespace.  Lines
       that begin with whitespace are ignored.

## OPTIONS
### -h -? --help
              Print help about usage and exit.

### -V --version
              Print the version of **gendict** and exit.

### -c --copyright
              Embeds the standard ICU copyright into the _output-file_.

### -v --verbose
              Display extra informative messages during execution.

### -i --icudatadir
              Look for any necessary ICU data files in _directory_.  For example, the file  **pnames.icu**
              must  be  located  when  ICU's data is not built as a shared library.  The default ICU
              data directory is specified by the environment variable **ICU**___**DATA**.  Most configurations
              of ICU do not require this argument.

### --uchars
              Set the output trie type to UChar. Mutually exclusive with **--bytes.**

### --bytes
              Set the output trie type to Bytes. Mutually exclusive with **--uchars.**

### --transform
              Set  the  transform  type. Should only be specified with **--bytes.**  Currently supported
              transforms are: **offset-<hex-number>,** which specifies an offset to  subtract  from  all
              input  characters.   It  should be noted that the offset transform also maps U+200D to
              0xFF and U+200C to 0xFE, in order to offer compatibility  to  languages  that  require
              these characters.  A transform must be specified for a bytes trie, and when applied to
              the non-value characters in the _input-file_ must produce output between 0x00 and 0xFF.

### input-file
              The source file to read.

### output-file
              The file to write the output dictionary to.

## CAVEATS
       The _input-file_ is assumed to be encoded in UTF-8.  The integers in the  _input-file_  that  are
       used as values must be made up of ASCII digits. They may be specified either in hex, by using
       a 0x prefix, or in decimal.  Either **--bytes** or **--uchars** must be specified.

## ENVIRONMENT
       **ICU**___**DATA**  Specifies the directory containing ICU data. Defaults to **${prefix}/share/icu/70.1/**.
                 Some  tools  in ICU depend on the presence of the trailing slash. It is thus impor‐
                 tant to make sure that it is present if **ICU**___**DATA** is set.

## AUTHORS
       Maxime Serrano

## VERSION
       1.0

## COPYRIGHT
       Copyright (C) 2012 International Business Machines Corporation and others

## SEE ALSO
### <http://www.icu-project.org/userguide/boundaryAnalysis.html>




ICU MANPAGE                                  1 June 2012                                  [GENDICT(1)](https://www.chedong.com/phpMan.php/man/GENDICT/1/markdown)
