phpMan > perldoc > Unicode::Map

Markdown | JSON | MCP    

NAME
    Unicode::Map V0.112 - maps charsets from and to utf16 unicode

SYNOPSIS
        use Unicode::Map();

        *$Map* = new Unicode::Map("ISO-8859-1");

        *$utf16* = *$Map* -> to_unicode ("Hello world!"); => $utf16 == "\0H\0e\0l\0l\0o\0
        \0w\0o\0r\0l\0d\0!"

        *$locale* = *$Map* -> from_unicode (*$utf16*); => $locale == "Hello world!"

    A more detailed description below.

    2do: short note about perl's Unicode perspectives.

DESCRIPTION
    This module converts strings from and to 2-byte Unicode UCS2 format. All mappings happen via 2
    byte UTF16 encodings, not via 1 byte UTF8 encoding. To transform these use Unicode::String.

    For historical reasons this module coexists with Unicode::Map8. Please use Unicode::Map8 unless
    you need to care for two byte character sets, e.g. chinese GB2312. Anyway, if you stick to the
    basic functionality (see documentation) you can use both modules equivalently.

    Practically this module will disappear from earth sooner or later as Unicode mapping support
    needs somehow to get into perl's core. If you like to work on this field please don't hesitate
    contacting Gisle Aas!

    This module can't deal directly with utf8. Use Unicode::String to convert utf8 to utf16 and vice
    versa.

    Character mapping is according to the data of binary mapfiles in Unicode::Map hierarchy. Binary
    mapfiles can also be created with this module, enabling you to install own specific character
    sets. Refer to mkmapfile or file REGISTRY in the Unicode::Map hierarchy.

CONVERSION METHODS
    Probably these are the only methods you will need from this module. Their usage is compatible
    with Unicode::Map8.

    new *$Map* = new Unicode::Map("GB2312-80")

        Returns a new Map object for GB2312-80 encoding.

    from_unicode
        *$dest* = *$Map* -> from_unicode (*$src*)

        Creates a string in locale charset representation from utf16 encoded string *$src*.

    to_unicode
        *$dest* = *$Map* -> to_unicode (*$src*)

        Creates a string in utf16 representation from *$src*.

    to8 Alias for *from_unicode*. For compatibility with Unicode::Map8

    to16
        Alias for *to_unicode*. For compatibility with Unicode::Map8

WARNINGS
    You can demand Unicode::Map to issue warnings at deprecated or incompatible usage with the
    constants WARN_DEFAULT, WARN_DEPRECATION or WARN_COMPATIBILITY. The latter both can be ored
    together.

    No special warnings:
        $Unicode::Map::WARNINGS = Unicode::Map::WARN_DEFAULT

    Warnings for deprecated usage:
        $Unicode::Map::WARNINGS = Unicode::Map::WARN_DEPRECATION

    Warnings for incompatible usage:
        $Unicode::Map::WARNINGS = Unicode::Map::WARN_COMPATIBILITY

MAINTAINANCE METHODS
    *Note:* These methods are solely for the maintainance of Unicode::Map. Using any of these
    methods will lead to programs incompatible with Unicode::Map8.

    alias
        *@list* = *$Map* -> alias (*$csid*)

        Returns a list of alias names of character set *$csid*.

    mapping
        *$path* = *$Map* -> mapping (*$csid*)

        Returns the absolute path of binary character mapping for character set *$csid* according to
        REGISTRY file of Unicode::Map.

    id  *$real_id*||"" = *$Map* -> id (*$test_id*)

        Returns a valid character set identifier *$real_id*, if *$test_id* is a valid character set
        name or alias name according to REGISTRY file of Unicode::Map.

    ids *@ids* = *$Map* -> ids()

        Returns a list of all character set names defined in REGISTRY file.

    read_text_mapping
        1||0 = *$Map* -> read_text_mapping (*$csid*, *$path*, *$style*)

        Read a text mapping of style *$style* named *$csid* from filename *$path*. The mapping then
        can be saved to a file with method: write_binary_mapping. <$style> can be:

         style          description

         "unicode"    A text mapping as of ftp://ftp.unicode.org/MAPPINGS/
         ""           Same as "unicode"
         "reverse"    Similar to unicode, but both columns are switched
         "keld"       A text mapping as of ftp://dkuug.dk/i18n/charmaps/

    src *$path* = *$Map* -> src (*$csid*)

        Returns the path of textual character mapping for character set *$csid* according to
        REGISTRY file of Unicode::Map.

    style
        *$path* = *$Map* -> style (*$csid*)

        Returns the style of textual character mapping for character set *$csid* according to
        REGISTRY file of Unicode::Map.

    write_binary_mapping
        1||0 = *$Map* -> write_binary_mapping (*$csid*, *$path*)

        Stores a mapping that has been loaded via method read_text_mapping in file *$path*.

DEPRECATED METHODS
    Some functionality is no longer promoted.

    noise
        Deprecated! Don't use any longer.

    reverse_unicode
        Deprecated! Use Unicode::String::byteswap instead.

BINARY MAPPINGS
    Structure of binary Mapfiles

    Unicode character mapping tables have sequences of sequential key and sequential value codes.
    This property is used to crunch the maps easily. n (0<n<256) sequential characters are
    represented as a bytecount n and the first character code key_start. For these subsequences the
    according value sequences are crunched together, also. The value 0 is used to start an extended
    information block (that is just partially implemented, though).

    One could think of two ways to make a binary mapfile. First method would be first to write a
    list of all key codes, and then to write a list of all value codes. Second method, used here,
    appends to all partial key code lists the according crunched value code lists. This makes value
    codes a little bit closer to key codes.

    Note: the file format is still in a very liquid state. Neither rely on that it will stay as
    this, nor that the description is bugless, nor that all features are implemented.

    STRUCTURE:

    <main>:
           offset  structure     value

           0x00    word          0x27b8   (magic)
           0x02    @(<extended> || <submapping>)

        The mapfile ends with extended mode <end> in main stream.

    <submapping>:
           0x00    byte != 0     charsize1 (bits)
           0x01    byte          n1 number of chars for one entry
           0x02    byte          charsize2 (bits)
           0x03    byte          n2 number of chars for one entry
           0x04    @(<extended> || <key_seq> || <key_val_seq)

           bs1=int((charsize1+7)/8), bs2=int((charsize2+7)/8)

        One submapping ends when <mapend> entry occurs.

    <key_val_seq>:
           0x00    size=0|1|2|4  n, number of sequential characters
           size    bs1           key1
           +bs1    bs2           value1
           +bs2    bs1           key2
           +bs1    bs2           value2
           ...

        key_val_seq ends, if either file ends (n = infinite mode) or n pairs are read.

    <key_seq>:
           0x00    byte          n, number of sequential characters
           0x01    bs1           key_start, first character of sequence
           1+bs1   @(<extended> || <val_seq>)

        A key sequence starts with a byte count telling how long the sequence is. It is followed by
        the key start code. After this comes a list of value sequences. The list of value sequences
        ends, if sum(m) equals n.

    <val_seq>:
           0x00    byte          m, number of sequential characters
           0x01    bs2           val_start, first character of sequence

    <extended>:
           0x00    byte          0
           0x01    byte          ftype
           0x02    byte          fsize, size of following structure
           0x03    fsize bytes   something

        For future extensions or private use one can insert here 1..255 byte long streams. ftype can
        have values 30..255, values 0..29 are reserved. Modi are not fully defined now and could
        change. They will be explained later.

TO BE DONE
    -   Something clever, when a character has no translation.

    -   Direct charset -> charset mapping.

    -   Better performance.

    -   Support for mappings according to RFC 1345.

SEE ALSO
    -   File "REGISTRY" and binary mappings in directory "Unicode/Map" of your perl library path

    -   recode(1), map(1), mkmapfile(1), Unicode::Map(3), Unicode::Map8(3), Unicode::String(3),
        Unicode::CharName(3), mirrorMappings(1)

    -   RFC 1345

    -   Mappings at Unicode consortium ftp://ftp.unicode.org/MAPPINGS/

    -   Registrated Internet character sets ftp://dkuug.dk/i18n/charmaps/

    -   2do: more references

AUTHOR
    Martin Schwartz <martin AT nacho.de>

Unicode::Map
NAME SYNOPSIS DESCRIPTION CONVERSION METHODS WARNINGS MAINTAINANCE METHODS DEPRECATED METHODS BINARY MAPPINGS TO BE DONE SEE ALSO AUTHOR

Generated by phpMan v3.7.7 Author: Che Dong Under GNU General Public License
2026-06-10 06:46 @216.73.217.62
CrawledBy Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
Valid XHTML 1.0 TransitionalValid CSS!

^_back to top