GENDICT(1) - man - phpman

Look up a command

 

Markdown Format | JSON API | MCP Server Tool | Cheat Sheet


GENDICT(1)
NAME SYNOPSIS DESCRIPTION OPTIONS CAVEATS ENVIRONMENT AUTHORS VERSION COPYRIGHT SEE ALSO
GENDICT(1)                                 ICU 70.1 Manual                                GENDICT(1)



NAME
       gendict - Compiles word list into ICU string trie dictionary

SYNOPSIS
       gendict  [  --uchars | --bytes --transform transform ] [ -h, -?, --help ] [ -V, --version ] [
       -c, --copyright ] [ -v, --verbose ] [ -i, --icudatadir directory ]  input-file  output-file

DESCRIPTION
       gendict reads the word list from dictionary-file and creates a string trie  dictionary  file.
       Normally this data file has the .dict extension.

       Words  begin  at  the  beginning of a line and are terminated by the first whitespace.  Lines
       that begin with whitespace are ignored.

OPTIONS
       -h, -?, --help
              Print help about usage and exit.

       -V, --version
              Print the version of gendict and exit.

       -c, --copyright
              Embeds the standard ICU copyright into the output-file.

       -v, --verbose
              Display extra informative messages during execution.

       -i, --icudatadir directory
              Look for any necessary ICU data files in directory.  For example, the file  pnames.icu
              must  be  located  when  ICU's data is not built as a shared library.  The default ICU
              data directory is specified by the environment variable ICU_DATA.  Most configurations
              of ICU do not require this argument.

       --uchars
              Set the output trie type to UChar. Mutually exclusive with --bytes.

       --bytes
              Set the output trie type to Bytes. Mutually exclusive with --uchars.

       --transform
              Set  the  transform  type. Should only be specified with --bytes.  Currently supported
              transforms are: offset-<hex-number>, which specifies an offset to  subtract  from  all
              input  characters.   It  should be noted that the offset transform also maps U+200D to
              0xFF and U+200C to 0xFE, in order to offer compatibility  to  languages  that  require
              these characters.  A transform must be specified for a bytes trie, and when applied to
              the non-value characters in the input-file must produce output between 0x00 and 0xFF.

        input-file
              The source file to read.

        output-file
              The file to write the output dictionary to.

CAVEATS
       The input-file is assumed to be encoded in UTF-8.  The integers in the  input-file  that  are
       used as values must be made up of ASCII digits. They may be specified either in hex, by using
       a 0x prefix, or in decimal.  Either --bytes or --uchars must be specified.

ENVIRONMENT
       ICU_DATA  Specifies the directory containing ICU data. Defaults to ${prefix}/share/icu/70.1/.
                 Some  tools  in ICU depend on the presence of the trailing slash. It is thus impor‐
                 tant to make sure that it is present if ICU_DATA is set.

AUTHORS
       Maxime Serrano

VERSION
       1.0

COPYRIGHT
       Copyright (C) 2012 International Business Machines Corporation and others

SEE ALSO
       http://www.icu-project.org/userguide/boundaryAnalysis.html




ICU MANPAGE                                  1 June 2012                                  GENDICT(1)

Generated by phpMan Author: Che Dong Under GNU General Public License
2026-06-02 18:47 @216.73.216.151 CrawledBy Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
Valid XHTML 1.0 TransitionalValid CSS!

^_back to top