{
    "mode": "perldoc",
    "parameter": "Unicode::Map8",
    "section": "",
    "url": "https://www.chedong.com/phpMan.php/perldoc/Unicode%3A%3AMap8/json",
    "generated": "2026-06-10T10:41:05Z",
    "synopsis": "require Unicode::Map8;\nmy $nomap = Unicode::Map8->new(\"ISO646-NO\") || die;\nmy $l1map = Unicode::Map8->new(\"latin1\")    || die;\nmy $ustr = $nomap->to16(\"V}re norske tegn b|r {res\\n\");\nmy $lstr = $l1map->to8($ustr);\nprint $lstr;\nprint $nomap->tou(\"V}re norske tegn b|r {res\\n\")->utf8",
    "sections": {
        "NAME": {
            "content": "Unicode::Map8 - Mapping table between 8-bit chars and Unicode\n",
            "subsections": []
        },
        "SYNOPSIS": {
            "content": "require Unicode::Map8;\nmy $nomap = Unicode::Map8->new(\"ISO646-NO\") || die;\nmy $l1map = Unicode::Map8->new(\"latin1\")    || die;\n\nmy $ustr = $nomap->to16(\"V}re norske tegn b|r {res\\n\");\nmy $lstr = $l1map->to8($ustr);\nprint $lstr;\n\nprint $nomap->tou(\"V}re norske tegn b|r {res\\n\")->utf8\n",
            "subsections": []
        },
        "DESCRIPTION": {
            "content": "The *Unicode::Map8* class implement efficient mapping tables between 8-bit character sets and 16\nbit character sets like Unicode. The tables are efficient both in terms of space allocated and\ntranslation speed. The 16-bit strings is assumed to use network byte order.\n\nThe following methods are available:\n\n$m = Unicode::Map8->new( [$charset] )\nThe object constructor creates new instances of the Unicode::Map8 class. I takes an optional\nargument that specify then name of a 8-bit character set to initialize mappings from. The\nargument can also be a the name of a mapping file. If the charset/file can not be located,\nthen the constructor returns *undef*.\n\nIf you omit the argument, then an empty mapping table is constructed. You must then add\nmapping pairs to it using the addpair() method described below.\n\n$m->addpair( $u8, $u16 );\nAdds a new mapping pair to the mapping object. It takes two arguments. The first is the code\nvalue in the 8-bit character set and the second is the corresponding code value in the\n16-bit character set. The same codes can be used multiple times (but using the same pair has\nno effect). The first definition for a code is the one that is used.\n\nConsider the following example:\n\n$m->addpair(0x20, 0x0020);\n$m->addpair(0x20, 0x00A0);\n$m->addpair(0xA0, 0x00A0);\n\nIt means that the character 0x20 and 0xA0 in the 8-bit charset maps to themselves in the\n16-bit set, but in the 16-bit character set 0x0A0 maps to 0x20.\n\n$m->defaultto8( $u8 )\nSet the code of the default character to use when mapping from 16-bit to 8-bit strings. If\nthere is no mapping pair defined for a character then this default is substituted by to8()\nand recode8().\n\n$m->defaultto16( $u16 )\nSet the code of the default character to use when mapping from 8-bit to 16-bit strings. If\nthere is no mapping pair defined for a character then this default is used by to16(), tou()\nand recode8().\n\n$m->nostrict;\nAll undefined mappings are replaced with the identity mapping. Undefined character are\nnormally just removed (or replaced with the default if defined) when converting between\ncharacter sets.\n\n$m->to8( $ustr );\nConverts a 16-bit character string to the corresponding string in the 8-bit character set.\n\n$m->to16( $str );\nConverts a 8-bit character string to the corresponding string in the 16-bit character set.\n\n$m->tou( $str );\nSame an to16() but return a Unicode::String object instead of a plain UCS2 string.\n\n$m->recode8($m2, $str);\nMap the string $str from one 8-bit character set ($m) to another one ($m2). Since we assume\nwe know the mappings towards the common 16-bit encoding we can use this to convert between\nany of the 8-bit character sets.\n\n$m->tochar16( $u8 )\nMaps a single 8-bit character code to an 16-bit code. If the 8-bit character is unmapped\nthen the constant NOCHAR is returned. The default is not used and the callback method is not\ninvoked.\n\n$m->tochar8( $u16 )\nMaps a single 16-bit character code to an 8-bit code. If the 16-bit character is unmapped\nthen the constant NOCHAR is returned. The default is not used and the callback method is not\ninvoked.\n\nThe following callback methods are available. You can override these methods by creating a\nsubclass of Unicode::Map8.\n\n$m->unmappedto8\nWhen mapping to 8-bit character string and there is no mapping defined (and no default\neither), then this method is called as the last resort. It is called with a single integer\nargument which is the code of the unmapped 16-bit character. It is expected to return a\nstring that will be incorporated in the 8-bit string. The default version of this method\nalways returns an empty string.\n\nExample:\n\npackage MyMapper;\n@ISA=qw(Unicode::Map8);\n\nsub unmappedto8\n{\nmy($self, $code) = @;\nrequire Unicode::CharName;\n\"<\" . Unicode::CharName::uname($code) . \">\";\n}\n\n$m->unmappedto16\nLikewise when mapping to 16-bit character string and no mapping is defined then this method\nis called. It should return a 16-bit string with the bytes in network byte order. The\ndefault version of this method always returns an empty string.\n",
            "subsections": []
        },
        "FILES": {
            "content": "The *Unicode::Map8* constructor can parse two different file formats; a binary format and a\ntextual format.\n\nThe binary format is simple. It consist of a sequence of 16-bit integer pairs in network byte\norder. The first pair should contain the magic value 0xFFFE, 0x0001. Of each pair, the first\nvalue is the code of an 8-bit character and the second is the code of the 16-bit character. If\nfollows from this that the first value should be less than 256.\n\nThe textual format consist of lines that is either a comment (first non-blank character is '#'),\na completely blank line or a line with two hexadecimal numbers. The hexadecimal numbers must be\npreceded by \"0x\" as in C and Perl. This is the same format used by the Unicode mapping files\navailable from <URL:ftp://ftp.unicode.org/Public>.\n\nThe mapping table files are installed in the Unicode/Map8/maps directory somewhere in the Perl\n@INC path. The variable $Unicode::Map8::MAPSDIR is the complete path name to this directory.\nBinary mapping files are stored within this directory with the suffix *.bin*. Textual mapping\nfiles are stored with the suffix *.txt*.\n\nThe scripts *map8bin2txt* and *map8txt2bin* can translate between these mapping file formats.\n\nA special file called aliases within $MAPSDIR specify all the alias names that can be used to\ndenote the various character sets. The first name of each line is the real file name and the\nrest is alias names separated by space.\n\nThe `\"umap --list\"' command be used to list the character sets supported.\n",
            "subsections": []
        },
        "BUGS": {
            "content": "Does not handle Unicode surrogate pairs as a single character.\n",
            "subsections": []
        },
        "SEE ALSO": {
            "content": "",
            "subsections": [
                {
                    "name": "umap",
                    "content": ""
                }
            ]
        },
        "COPYRIGHT": {
            "content": "Copyright 1998 Gisle Aas.\n\nThis library is free software; you can redistribute it and/or modify it under the same terms as\nPerl itself.\n",
            "subsections": []
        }
    },
    "summary": "Unicode::Map8 - Mapping table between 8-bit chars and Unicode",
    "flags": [],
    "examples": [],
    "see_also": []
}