# phpman > man > uconv(1)

[UCONV(1)](https://www.chedong.com/phpMan.php/man/UCONV/1/markdown)                                   ICU 70.1 Manual                                  [UCONV(1)](https://www.chedong.com/phpMan.php/man/UCONV/1/markdown)



## NAME
       **uconv** - convert data from one encoding to another

## SYNOPSIS
       **uconv**  [ **-h**, **-?**, **--help** ] [ **-V**, **--version** ] [ **-s**, **--silent** ] [ **-v**, **--verbose** ] [ **-l**, **--list** |
### -l --list-code --default-code -L --list-transliterators --canon -x
       _transliteration_ ] [ **--to-callback** _callback_ | **-c** ] [ **--from-callback** _callback_ | **-i** ] [ **--call**‐‐
       **back** _callback_ ] [ **--fallback** | **--no-fallback** ] [ **-b**, **--block-size** _size_ ]  [  **-f**,  **--from-code**
       _encoding_ ] [ **-t**, **--to-code** _encoding_ ] [ **--add-signature** ] [ **--remove-signature** ] [ **-o**, **--out**‐‐
       **put** _file_ ] [ _file_...  ]

## DESCRIPTION
       **uconv** converts, or transcodes, each given _file_ (or its standard input if no  _file_  is  speci‐
       fied)  from one _encoding_ to another.  The transcoding is done using Unicode as a pivot encod‐
       ing (i.e. the data are first transcoded from their original encoding  to  Unicode,  and  then
       from Unicode to the destination encoding).

       If  an  _encoding_  is not specified or is **-**, the default encoding is used. Thus, calling **uconv**
       with no _encoding_ provides an easy way to validate and sanitize data files  for  further  con‐
       sumption by tools requiring data in the default encoding.

       When calling **uconv**, it is possible to specify callbacks that are used to handle invalid char‐
       acters in the input, or characters that cannot be transcoded  to  the  destination  encoding.
       Some  encodings, for example, offer a default substitution character that can be used to rep‐
       resent the occurrence of such characters in the input. Other callbacks offer a useful  visual
       representation of the invalid data.

       **uconv**  can  also  run  the  specified  _transliteration_  on the transcoded data, in which case
       transliteration will happen as an intermediate step, after the data have been  transcoded  to
       Unicode.   The  _transliteration_  can  be  either a list of semicolon-separated transliterator
       names, or an arbitrarily complex set of rules in the ICU transliteration rules format.

       For transcoding purposes, **uconv** options are compatible with those of [**iconv**(1)](https://www.chedong.com/phpMan.php/man/iconv/1/markdown), making it easy
       to  replace  it  in scripts. It is not necessarily the case, however, that the encoding names
       used by **uconv** and ICU are the same as the ones used by [**iconv**(1)](https://www.chedong.com/phpMan.php/man/iconv/1/markdown).  Also, options that  provide
       informational  data,  such  as  the  **-l**, **--list** one offered by some [**iconv**(1)](https://www.chedong.com/phpMan.php/man/iconv/1/markdown) variants such as
       GNU's, produce data in a slightly different and easier to parse format.

## OPTIONS
### -h -? --help
              Print help about usage and exit.

### -V --version
              Print the version of **uconv** and exit.

### -s --silent
              Suppress messages during execution.

### -v --verbose
              Display extra informative messages during execution.

### -l --list
              List all the available encodings and exit.

### -l --list-code
              List only the _code_ encoding and exit. If _code_ is not a proper encoding, exit  with  an
              error.

### --default-code
              List only the name of the default encoding and exit.

### -L --list-transliterators
              List all the available transliterators and exit.

### --canon
              If used with **-l**, **--list** or **--default-code**, the list of encodings is produced in a for‐
              mat compatible with [**convrtrs.txt**(5)](https://www.chedong.com/phpMan.php/man/convrtrs.txt/5/markdown).  If used with **-L**,  **--list-transliterators**,  print
              only one transliterator name per line.

### -x
              Run  the given _transliteration_ on the transcoded Unicode data, and use the transliter‐
              ated data as input for the transcoding to the destination encoding.

       **--to-callback** _callback_
              Use _callback_ to handle characters that cannot be transcoded to the destination  encod‐
              ing. See section **CALLBACKS** for details on valid callbacks.

### -c --to-callback

       **--from-callback** _callback_
              Use  _callback_  to handle characters that cannot be transcoded from the original encod‐
              ing. See section **CALLBACKS** for details on valid callbacks.

### -i --from-callback

       **--callback** _callback_
              Use _callback_ to handle both characters that cannot be transcoded from the original en‐
              coding  and characters that cannot be transcoded to the destination encoding. See sec‐
              tion **CALLBACKS** for details on valid callbacks.

### --fallback
              Use the fallback mapping when transcoding from Unicode to the destination encoding.

### --no-fallback
              Do not use the fallback mapping when transcoding from Unicode to the  destination  en‐
              coding.  This is the default.

### -b --block-size
              Read input in blocks of _size_ bytes at a time. The default block size is 4096.

### -f --from-code
              Set the original encoding of the data to _encoding_.

### -t --to-code
              Transcode the data to _encoding_.

### --add-signature
              Add  a  U+FEFF Unicode signature character (BOM) if the output charset supports it and
              does not add one anyway.

### --remove-signature
              Remove a U+FEFF Unicode signature character (BOM).

### -o --output
              Write the transcoded data to _file_.

## CALLBACKS
       **uconv** supports specifying callbacks to handle invalid data. Callbacks can be set for both di‐
       rections  of transcoding: from the original encoding to Unicode, with the **--from-callback** op‐
       tion, and from Unicode to the destination encoding, with the **--to-callback** option.

       The following is a list of valid _callback_ names, along with a description of their  behavior.
       The  list  of  callbacks  actually supported by **uconv** is displayed when it is called with **-h**,
       **--help**.

       **substitute**       Write the encoding's substitute sequence, or the Unicode replacement charac‐
                        ter **U+FFFD** when transcoding to Unicode.

       **skip**             Ignore the invalid data.

       **stop**             Stop  with  an  error  when  encountering invalid data.  This is the default
                        callback.

       **escape**           Same as **escape-icu**.

       **escape-icu**       Replace the missing characters with a string of the format **%U**_hhhh_ for  plane
                        0 characters, and **%U**_hhhh_**%U**_hhhh_ for planes 1 and above characters, where _hhhh_
                        is the hexadecimal value of one of the UTF-16 code  units  representing  the
                        character.  Characters  from  planes  1  and  above are written as a pair of
                        UTF-16 surrogate code units.

       **escape-java**      Replace the missing characters with a string of the format **\u**_hhhh_ for  plane
                        0 characters, and **\u**_hhhh_**\u**_hhhh_ for planes 1 and above characters, where _hhhh_
                        is the hexadecimal value of one of the UTF-16 code  units  representing  the
                        character.  Characters  from  planes  1  and  above are written as a pair of
                        UTF-16 surrogate code units.

       **escape-c**         Replace the missing characters with a string of the format **\u**_hhhh_ for  plane
                        0  characters,  and **\U**_hhhhhhhh_ for planes 1 and above characters, where _hhhh_
                        and _hhhhhhhh_ are the hexadecimal values of the Unicode codepoint.

       **escape-xml**       Same as **escape-xml-hex**.

       **escape-xml-hex**   Replace the missing characters with a string of the format  **&#x**_hhhh_**;**,  where
                        _hhhh_ is the hexadecimal value of the Unicode codepoint.

       **escape-xml-dec**   Replace  the  missing  characters with a string of the format **&#**_nnnn_**;**, where
                        _nnnn_ is the decimal value of the Unicode codepoint.

       **escape-unicode**   Replace the missing characters with a string of the format  **{U+**_hhhh_**}**,  where
                        _hhhh_  is  the  hexadecimal value of the Unicode codepoint.  That hexadecimal
                        string is of variable length and can use from 4 to 6 digits.   This  is  the
                        format universally used to denote a Unicode codepoint in the literature, de‐
                        limited by curly braces for easy recognition of those substitutions  in  the
                        output.

## EXAMPLES
       Convert data from a given _encoding_ to the platform encoding:

           $ **uconv** **-f** _encoding_

       Check if a _file_ contains valid data for a given _encoding_:

           $ **uconv** **-f** _encoding_ **-c** _file_ **>/dev/null**

       Convert  a  UTF-8 _file_ to a given _encoding_ and ensure that the resulting text is good for any
       version of HTML:

           $ **uconv** **-f** **utf-8** **-t** _encoding_ **\**
               **--callback** **escape-xml-dec** _file_

       Display the names of the Unicode code points in a UTF-file:

           $ **uconv** **-f** **utf-8** **-x** **any-name** _file_

       Print the name of a Unicode code point whose value is known (**U+30AB** in this example):

           $ **echo** **'\u30ab'** **|** **uconv** **-x** **'hex-any;** **any-name';** **echo**
           {KATAKANA LETTER KA}{LINE FEED}
           $

       (The names are delimited by curly braces.  Also, the name of the line terminator is also dis‐
       played.)

       Normalize  UTF-8  data using Unicode NFKC, remove all control characters, and map Katakana to
       Hiragana:

           $ **uconv** **-f** **utf-8** **-t** **utf-8** **\**
                 **-x** **'::nfkc;** **[:Cc:]** **>;** **::katakana-hiragana;'**

## CAVEATS AND BUGS
       **uconv** does report errors as occurring at the first invalid byte encountered. This may be con‐
       fusing  to  users  of GNU [**iconv**(1)](https://www.chedong.com/phpMan.php/man/iconv/1/markdown), which reports errors as occurring at the first byte of an
       invalid sequence. For multi-byte character sets or encodings, this means that **uconv** error po‐
       sitions  may  be  at  a  later  offset  in  the  input stream than would be the case with GNU
       [**iconv**(1)](https://www.chedong.com/phpMan.php/man/iconv/1/markdown).

       The reporting of error positions when a transliterator is used may be inaccurate or  unavail‐
       able,  in which case **uconv** will report the offset in the output stream at which the error oc‐
       curred.

## AUTHORS
       Jonas Utterstroem
       Yves Arrouye

## VERSION
       70.1

## COPYRIGHT
       Copyright (C) 2000-2005 IBM, Inc. and others.

## SEE ALSO
       [**iconv**(1)](https://www.chedong.com/phpMan.php/man/iconv/1/markdown)



ICU MANPAGE                                  2005-jul-1                                     [UCONV(1)](https://www.chedong.com/phpMan.php/man/UCONV/1/markdown)
