NKF - phpMan

Che Dong
NAME
SYNOPSIS
DESCRIPTION
    Nkf is a yet another kanji code converter among networks, hosts and
    terminals. It converts input kanji code to designated kanji code such as
    ISO-2022-JP, Shift_JIS, EUC-JP, UTF-8, UTF-16 or UTF-32.

    One of the most unique faculty of nkf is the guess of the input kanji
    encodings. It currently recognizes ISO-2022-JP, Shift_JIS, EUC-JP,
    UTF-8, UTF-16 and UTF-32. So users needn't set the input kanji code
    explicitly.

    By default, X0201 kana is converted into X0208 kana. For X0201 kana,
    SO/SI, SSO and ESC-(-I methods are supported. For automatic code
    detection, nkf assumes no X0201 kana in Shift_JIS. To accept X0201 in
    Shift_JIS, use -X, -x or -S.

    multiple options are specifed as seprate strings, such as

      print nkf('--ic=UTF8-MAC', '-w', $string), "\n";

    except the last arguments.

OPTIONS
    -J -S -E -W -W16 -W32 -j -s -e -w -w16 -w32
        Specify input and output encodings. Upper case is input. cf. --ic
        and --oc.

        -J  ISO-2022-JP (JIS code).

        -S  Shift_JIS and JIS X 0201 kana. EUC-JP is recognized as X0201
            kana. Without -x flag, JIS X 0201 Katakana (a.k.a.halfwidth
            kana) is converted into JIS X 0208. If you use Windows, see
            Windows-31J (CP932).

        -E  EUC-JP.

        -W  UTF-8N.

        -W16[BL][0]
            UTF-16. B or L gives whether Big Endian or Little Endian. 0
            gives whther put BOM or not.

        -W32[BL][0]
            UTF-32. B or L gives whether Big Endian or Little Endian. 0
            gives whther put BOM or not.

    -b -u
        Output is buffered (DEFAULT), Output is unbuffered.

    -t  No conversion.

    -i[@B]
        Specify the escape sequence for JIS X 0208.

        -i@ Use ESC ( @. (JIS X 0208-1978)

        -iB Use ESC ( B. (JIS X 0208-1983/1990 DEFAULT)

    -o[BJ]
        Specify the escape sequence for US-ASCII/JIS X 0201 Roman. (DEFAULT
        B)

    -r  {de/en}crypt ROT13/47

    -h[123] --hiragana --katakana --katakana-hiragana

        -h1 --hiragana
            Katakana to Hiragana conversion.

        -h2 --katakana
            Hiragana to Katakana conversion.

        -h3 --katakana-hiragana
            Katakana to Hiragana and Hiragana to Katakana conversion.

    -T  Text mode output (MS-DOS)

    -f[*m* [- *n*]]
        Folding on *m* length with *n* margin in a line. Without this
        option, fold length is 60 and fold margin is 10.

    -F  New line preserving line folding.

    -Z[0-3]
        Convert X0208 alphabet (Fullwidth Alphabets) to ASCII.

        -Z -Z0
            Convert X0208 alphabet to ASCII.

        -Z1 Convert X0208 kankaku to single ASCII space.

        -Z2 Convert X0208 kankaku to double ASCII spaces.

        -Z3 Replacing fullwidth >, <, ", & into '&gt;', '&lt;', '&quot;',
            '&amp;' as in HTML.

    -X -x
        With -X or without this option, X0201 is converted into X0208 Kana.
        With -x, try to preserve X0208 kana and do not convert X0201 kana to
        X0208. In JIS output, ESC-(-I is used. In EUC output, SS2 is used.

    -B[0-2]
        Assume broken JIS-Kanji input, which lost ESC. Useful when your site
        is using old B-News Nihongo patch.

        -B1 allows any chars after ESC-( or ESC-$.

        -B2 force ASCII after NL.

    -I  Replacing non iso-2022-jp char into a geta character (substitute
        character in Japanese).

    -m[BQN0]
        MIME ISO-2022-JP/ISO8859-1 decode. (DEFAULT) To see ISO8859-1
        (Latin-1) -l is necessary.

        -mB Decode MIME base64 encoded stream. Remove header or other part
            before conversion.

        -mQ Decode MIME quoted stream. '_' in quoted stream is converted to
            space.

        -mN Non-strict decoding. It allows line break in the middle of the
            base64 encoding.

        -m0 No MIME decode.

    -M  MIME encode. Header style. All ASCII code and control characters are
        intact.

        -MB MIME encode Base64 stream. Kanji conversion is performed before
            encoding, so this cannot be used as a picture encoder.

        -MQ Perform quoted encoding.

    -l  Input and output code is ISO8859-1 (Latin-1) and ISO-2022-JP. -s, -e
        and -x are not compatible with this option.

    -L[uwm] -d -c
        Convert line breaks.

        -Lu -d
            unix (LF)

        -Lw -c
            windows (CRLF)

        -Lm mac (CR)

            Without this option, nkf doesn't convert line breaks.

    --fj --unix --mac --msdos --windows
        Convert for these systems.

    --jis --euc --sjis --mime --base64
        Convert to named code.

    --jis-input --euc-input --sjis-input --mime-input --base64-input
        Assume input system

    --ic=*input codeset* --oc=*output codeset*
        Set the input or output codeset. NKF supports following codesets and
        those codeset names are case insensitive.

        ISO-2022-JP
            a.k.a. RFC1468, 7bit JIS, JUNET

        EUC-JP (eucJP-nkf)
            a.k.a. AT&T JIS, Japanese EUC, UJIS

        eucJP-ascii
        eucJP-ms
        CP51932
            Microsoft Version of EUC-JP.

        Shift_JIS
            a.k.a. SJIS, MS_Kanji

        Windows-31J
            a.k.a. CP932

        UTF-8
            same as UTF-8N

        UTF-8N
            UTF-8 without BOM

        UTF-8-BOM
            UTF-8 with BOM

        UTF8-MAC (input only)
            decomposed UTF-8

        UTF-16
            same as UTF-16BE

        UTF-16BE
            UTF-16 Big Endian without BOM

        UTF-16BE-BOM
            UTF-16 Big Endian with BOM

        UTF-16LE
            UTF-16 Little Endian without BOM

        UTF-16LE-BOM
            UTF-16 Little Endian with BOM

        UTF-32
            same as UTF-32BE

        UTF-32BE
            UTF-32 Big Endian without BOM

        UTF-32BE-BOM
            UTF-32 Big Endian with BOM

        UTF-32LE
            UTF-32 Little Endian without BOM

        UTF-32LE-BOM
            UTF-32 Little Endian with BOM

    --fb-{skip, html, xml, perl, java, subchar}
        Specify the way that nkf handles unassigned characters. Without this
        option, --fb-skip is assumed.

    --prefix=*escape character**target character*..
        When nkf converts to Shift_JIS, nkf adds a specified escape
        character to specified 2nd byte of Shift_JIS characters. 1st byte of
        argument is the escape character and following bytes are target
        characters.

    --no-cp932ext
        Handle the characters extended in CP932 as unassigned characters.

    --no-best-fit-chars
        When Unicode to Encoded byte conversion, don't convert characters
        which is not round trip safe. When Unicode to Unicode conversion,
        with this and -x option, nkf can be used as UTF converter. (In other
        words, without this and -x option, nkf doesn't save some characters)

        When nkf converts strings that related to path, you should use this
        opion.

    --cap-input
        Decode hex encoded characters.

    --url-input
        Unescape percent escaped characters.

    --numchar-input
        Decode character reference, such as "&#....;".

    --  Ignore rest of -option.

AUTHOR
    Copyright (c) 1987, Fujitsu LTD. (Itaru ICHIKAWA).

    Copyright (c) 1996-2018, The nkf Project.