# phpman > info > Unicode::Map

[Map(3pm)](https://www.chedong.com/phpMan.php/man/Map/3pm/markdown)              User Contributed Perl Documentation             [Map(3pm)](https://www.chedong.com/phpMan.php/man/Map/3pm/markdown)

NAME
       [Unicode::Map](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap/markdown) V0.112 - maps charsets from and to utf16 unicode

SYNOPSIS
           use [Unicode::Map](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap/markdown)();

           $Map = new [Unicode::Map](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap/markdown)("ISO-8859-1");

           $utf16 = $Map -> to_unicode ("Hello world!");
             => $utf16 == "\0H\0e\0l\0l\0o\0 \0w\0o\0r\0l\0d\0!"

           $locale = $Map -> from_unicode ($utf16);
             => $locale == "Hello world!"

       A more detailed description below.

       2do: short note about perl's Unicode perspectives.

DESCRIPTION
       This module converts strings from and to 2-byte Unicode UCS2 format.
       All mappings happen via 2 byte UTF16 encodings, not via 1 byte UTF8
       encoding. To transform these use [Unicode::String](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AString/markdown).

       For historical reasons this module coexists with [Unicode::Map8](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap8/markdown).  Please
       use [Unicode::Map8](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap8/markdown) unless you need to care for two byte character sets,
       e.g. chinese GB2312. Anyway, if you stick to the basic functionality
       (see documentation) you can use both modules equivalently.

       Practically this module will disappear from earth sooner or later as
       Unicode mapping support needs somehow to get into perl's core. If you
       like to work on this field please don't hesitate contacting Gisle Aas!

       This module can't deal directly with utf8. Use [Unicode::String](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AString/markdown) to
       convert utf8 to utf16 and vice versa.

       Character mapping is according to the data of binary mapfiles in
       [Unicode::Map](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap/markdown) hierarchy. Binary mapfiles can also be created with this
       module, enabling you to install own specific character sets. Refer to
       mkmapfile or file REGISTRY in the [Unicode::Map](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap/markdown) hierarchy.

CONVERSION METHODS
       Probably these are the only methods you will need from this module.
       Their usage is compatible with [Unicode::Map8](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap8/markdown).

       new $Map = new [Unicode::Map](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap/markdown)("GB2312-80")

           Returns a new Map object for GB2312-80 encoding.

       from_unicode
           $dest = $Map -> from_unicode ($src)

           Creates a string in locale charset representation from utf16
           encoded string $src.

       to_unicode
           $dest   = $Map -> to_unicode ($src)

           Creates a string in utf16 representation from $src.

       to8 Alias for from_unicode. For compatibility with [Unicode::Map8](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap8/markdown)

       to16
           Alias for to_unicode. For compatibility with [Unicode::Map8](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap8/markdown)

WARNINGS
       You can demand [Unicode::Map](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap/markdown) to issue warnings at deprecated or
       incompatible usage with the constants WARN_DEFAULT, WARN_DEPRECATION or
       WARN_COMPATIBILITY.  The latter both can be ored together.

       No special warnings:
           $[Unicode::Map::WARNINGS](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap%3A%3AWARNINGS/markdown) = [Unicode::Map::WARN_DEFAULT](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap%3A%3AWARNDEFAULT/markdown)

       Warnings for deprecated usage:
           $[Unicode::Map::WARNINGS](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap%3A%3AWARNINGS/markdown) = [Unicode::Map::WARN_DEPRECATION](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap%3A%3AWARNDEPRECATION/markdown)

       Warnings for incompatible usage:
           $[Unicode::Map::WARNINGS](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap%3A%3AWARNINGS/markdown) = [Unicode::Map::WARN_COMPATIBILITY](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap%3A%3AWARNCOMPATIBILITY/markdown)

MAINTAINANCE METHODS
       Note: These methods are solely for the maintainance of [Unicode::Map](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap/markdown).
       Using any of these methods will lead to programs incompatible with
       [Unicode::Map8](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap8/markdown).

       alias
           @list = $Map -> alias ($csid)

           Returns a list of alias names of character set $csid.

       mapping
           $path = $Map -> mapping ($csid)

           Returns the absolute path of binary character mapping for character
           set $csid according to REGISTRY file of [Unicode::Map](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap/markdown).

       id  $real_id||"" = $Map -> id ($test_id)

           Returns a valid character set identifier $real_id, if $test_id is a
           valid character set name or alias name according to REGISTRY file
           of [Unicode::Map](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap/markdown).

       ids @ids = $Map -> ids()

           Returns a list of all character set names defined in REGISTRY file.

       read_text_mapping
           1||0 = $Map -> read_text_mapping ($csid, $path, $style)

           Read a text mapping of style $style named $csid from filename
           $path.  The mapping then can be saved to a file with method:
           write_binary_mapping.  <$style> can be:

            style          description

            "unicode"    A text mapping as of ftp://ftp.unicode.org/MAPPINGS/
            ""           Same as "unicode"
            "reverse"    Similar to unicode, but both columns are switched
            "keld"       A text mapping as of ftp://dkuug.dk/i18n/charmaps/

       src $path = $Map -> src ($csid)

           Returns the path of textual character mapping for character set
           $csid according to REGISTRY file of [Unicode::Map](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap/markdown).

       style
           $path = $Map -> style ($csid)

           Returns the style of textual character mapping for character set
           $csid according to REGISTRY file of [Unicode::Map](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap/markdown).

       write_binary_mapping
           1||0 = $Map -> write_binary_mapping ($csid, $path)

           Stores a mapping that has been loaded via method read_text_mapping
           in file $path.

DEPRECATED METHODS
       Some functionality is no longer promoted.

       noise
           Deprecated! Don't use any longer.

       reverse_unicode
           Deprecated! Use [Unicode::String::byteswap](https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AString%3A%3Abyteswap/markdown) instead.

BINARY MAPPINGS
       Structure of binary Mapfiles

       Unicode character mapping tables have sequences of sequential key and
       sequential value codes. This property is used to crunch the maps
       easily.  n (0<n<256) sequential characters are represented as a
       bytecount n and the first character code key_start. For these
       subsequences the according value sequences are crunched together, also.
       The value 0 is used to start an extended information block (that is
       just partially implemented, though).

       One could think of two ways to make a binary mapfile. First method
       would be first to write a list of all key codes, and then to write a
       list of all value codes. Second method, used here, appends to all
       partial key code lists the according crunched value code lists. This
       makes value codes a little bit closer to key codes.

       Note: the file format is still in a very liquid state. Neither rely on
       that it will stay as this, nor that the description is bugless, nor
       that all features are implemented.

       STRUCTURE:

       <main>:
              offset  structure     value

              0x00    word          0x27b8   (magic)
              0x02    @(<extended> || <submapping>)

           The mapfile ends with extended mode <end> in main stream.

       <submapping>:
              0x00    byte != 0     charsize1 (bits)
              0x01    byte          n1 number of chars for one entry
              0x02    byte          charsize2 (bits)
              0x03    byte          n2 number of chars for one entry
              0x04    @(<extended> || <key_seq> || <key_val_seq)

              bs1=int((charsize1+7)/8), bs2=int((charsize2+7)/8)

           One submapping ends when <mapend> entry occurs.

       <key_val_seq>:
              0x00    size=0|1|2|4  n, number of sequential characters
              size    bs1           key1
              +bs1    bs2           value1
              +bs2    bs1           key2
              +bs1    bs2           value2
              ...

           key_val_seq ends, if either file ends (n = infinite mode) or n
           pairs are read.

       <key_seq>:
              0x00    byte          n, number of sequential characters
              0x01    bs1           key_start, first character of sequence
              1+bs1   @(<extended> || <val_seq>)

           A key sequence starts with a byte count telling how long the
           sequence is. It is followed by the key start code. After this comes
           a list of value sequences. The list of value sequences ends, if
           sum(m) equals n.

       <val_seq>:
              0x00    byte          m, number of sequential characters
              0x01    bs2           val_start, first character of sequence

       <extended>:
              0x00    byte          0
              0x01    byte          ftype
              0x02    byte          fsize, size of following structure
              0x03    fsize bytes   something

           For future extensions or private use one can insert here 1..255
           byte long streams. ftype can have values 30..255, values 0..29 are
           reserved. Modi are not fully defined now and could change. They
           will be explained later.

TO BE DONE
       -   Something clever, when a character has no translation.

       -   Direct charset -> charset mapping.

       -   Better performance.

       -   Support for mappings according to RFC 1345.

SEE ALSO
       -   File "REGISTRY" and binary mappings in directory "Unicode/Map" of
           your perl library path

       -   [recode(1)](https://www.chedong.com/phpMan.php/man/recode/1/markdown), [map(1)](https://www.chedong.com/phpMan.php/man/map/1/markdown), [mkmapfile(1)](https://www.chedong.com/phpMan.php/man/mkmapfile/1/markdown), Unicode::[Map(3)](https://www.chedong.com/phpMan.php/man/Map/3/markdown), Unicode::[Map8(3)](https://www.chedong.com/phpMan.php/man/Map8/3/markdown),
           Unicode::[String(3)](https://www.chedong.com/phpMan.php/man/String/3/markdown), Unicode::[CharName(3)](https://www.chedong.com/phpMan.php/man/CharName/3/markdown), [mirrorMappings(1)](https://www.chedong.com/phpMan.php/man/mirrorMappings/1/markdown)

       -   RFC 1345

       -   Mappings at Unicode consortium ftp://ftp.unicode.org/MAPPINGS/

       -   Registrated Internet character sets ftp://dkuug.dk/i18n/charmaps/

       -   2do: more references

AUTHOR
       Martin Schwartz <<martin@nacho.de>>

perl v5.34.0                      2022-02-06                          [Map(3pm)](https://www.chedong.com/phpMan.php/man/Map/3pm/markdown)
