Map8(3pm) User Contributed Perl Documentation Map8(3pm)
NAME
Unicode::Map8 - Mapping table between 8-bit chars and Unicode
SYNOPSIS
require Unicode::Map8;
my $no_map = Unicode::Map8->new("ISO646-NO") || die;
my $l1_map = Unicode::Map8->new("latin1") || die;
my $ustr = $no_map->to16("V}re norske tegn b|r {res\n");
my $lstr = $l1_map->to8($ustr);
print $lstr;
print $no_map->tou("V}re norske tegn b|r {res\n")->utf8
DESCRIPTION
The Unicode::Map8 class implement efficient mapping tables between 8-bit character sets
and 16 bit character sets like Unicode. The tables are efficient both in terms of space
allocated and translation speed. The 16-bit strings is assumed to use network byte order.
The following methods are available:
$m = Unicode::Map8->new( [$charset] )
The object constructor creates new instances of the Unicode::Map8 class. I takes an
optional argument that specify then name of a 8-bit character set to initialize
mappings from. The argument can also be a the name of a mapping file. If the
charset/file can not be located, then the constructor returns undef.
If you omit the argument, then an empty mapping table is constructed. You must then
add mapping pairs to it using the addpair() method described below.
$m->addpair( $u8, $u16 );
Adds a new mapping pair to the mapping object. It takes two arguments. The first is
the code value in the 8-bit character set and the second is the corresponding code
value in the 16-bit character set. The same codes can be used multiple times (but
using the same pair has no effect). The first definition for a code is the one that
is used.
Consider the following example:
$m->addpair(0x20, 0x0020);
$m->addpair(0x20, 0x00A0);
$m->addpair(0xA0, 0x00A0);
It means that the character 0x20 and 0xA0 in the 8-bit charset maps to themselves in
the 16-bit set, but in the 16-bit character set 0x0A0 maps to 0x20.
$m->default_to8( $u8 )
Set the code of the default character to use when mapping from 16-bit to 8-bit
strings. If there is no mapping pair defined for a character then this default is
substituted by to8() and recode8().
$m->default_to16( $u16 )
Set the code of the default character to use when mapping from 8-bit to 16-bit
strings. If there is no mapping pair defined for a character then this default is used
by to16(), tou() and recode8().
$m->nostrict;
All undefined mappings are replaced with the identity mapping. Undefined character
are normally just removed (or replaced with the default if defined) when converting
between character sets.
$m->to8( $ustr );
Converts a 16-bit character string to the corresponding string in the 8-bit character
set.
$m->to16( $str );
Converts a 8-bit character string to the corresponding string in the 16-bit character
set.
$m->tou( $str );
Same an to16() but return a Unicode::String object instead of a plain UCS2 string.
$m->recode8($m2, $str);
Map the string $str from one 8-bit character set ($m) to another one ($m2). Since we
assume we know the mappings towards the common 16-bit encoding we can use this to
convert between any of the 8-bit character sets.
$m->to_char16( $u8 )
Maps a single 8-bit character code to an 16-bit code. If the 8-bit character is
unmapped then the constant NOCHAR is returned. The default is not used and the
callback method is not invoked.
$m->to_char8( $u16 )
Maps a single 16-bit character code to an 8-bit code. If the 16-bit character is
unmapped then the constant NOCHAR is returned. The default is not used and the
callback method is not invoked.
The following callback methods are available. You can override these methods by creating
a subclass of Unicode::Map8.
$m->unmapped_to8
When mapping to 8-bit character string and there is no mapping defined (and no default
either), then this method is called as the last resort. It is called with a single
integer argument which is the code of the unmapped 16-bit character. It is expected
to return a string that will be incorporated in the 8-bit string. The default version
of this method always returns an empty string.
Example:
package MyMapper;
@ISA=qw(Unicode::Map8);
sub unmapped_to8
{
my($self, $code) = @_;
require Unicode::CharName;
"<" . Unicode::CharName::uname($code) . ">";
}
$m->unmapped_to16
Likewise when mapping to 16-bit character string and no mapping is defined then this
method is called. It should return a 16-bit string with the bytes in network byte
order. The default version of this method always returns an empty string.
FILES
The Unicode::Map8 constructor can parse two different file formats; a binary format and a
textual format.
The binary format is simple. It consist of a sequence of 16-bit integer pairs in network
byte order. The first pair should contain the magic value 0xFFFE, 0x0001. Of each pair,
the first value is the code of an 8-bit character and the second is the code of the 16-bit
character. If follows from this that the first value should be less than 256.
The textual format consist of lines that is either a comment (first non-blank character is
'#'), a completely blank line or a line with two hexadecimal numbers. The hexadecimal
numbers must be preceded by "0x" as in C and Perl. This is the same format used by the
Unicode mapping files available from <URL:ftp://ftp.unicode.org/Public>.
The mapping table files are installed in the Unicode/Map8/maps directory somewhere in the
Perl @INC path. The variable $Unicode::Map8::MAPS_DIR is the complete path name to this
directory. Binary mapping files are stored within this directory with the suffix .bin.
Textual mapping files are stored with the suffix .txt.
The scripts map8_bin2txt and map8_txt2bin can translate between these mapping file
formats.
A special file called aliases within $MAPS_DIR specify all the alias names that can be
used to denote the various character sets. The first name of each line is the real file
name and the rest is alias names separated by space.
The `"umap --list"' command be used to list the character sets supported.
BUGS
Does not handle Unicode surrogate pairs as a single character.
SEE ALSO
umap(1), Unicode::String
COPYRIGHT
Copyright 1998 Gisle Aas.
This library is free software; you can redistribute it and/or modify it under the same
terms as Perl itself.
perl v5.34.0 2022-02-07 Map8(3pm)
Generated by $Id: phpMan.php,v 4.55 2007/09/05 04:42:51 chedong Exp $ Author: Che Dong
On Apache
Under GNU General Public License
2025-11-29 20:58 @216.73.216.105 CrawledBy Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)