phpman > perldoc > Unicode::MapUTF8(3pm)

Markdown | JSON | MCP    

NAME
    Unicode::MapUTF8 - Conversions to and from arbitrary character sets and UTF8

SYNOPSIS
     use Unicode::MapUTF8 qw(to_utf8 from_utf8 utf8_supported_charset);

     # Convert a string in 'ISO-8859-1' to 'UTF8'
     my $output = to_utf8({ -string => 'An example', -charset => 'ISO-8859-1' });

     # Convert a string in 'UTF8' encoding to encoding 'ISO-8859-1'
     my $other  = from_utf8({ -string => 'Other text', -charset => 'ISO-8859-1' });

     # List available character set encodings
     my @character_sets = utf8_supported_charset;

     # Add a character set alias
     utf8_charset_alias({ 'ms-japanese' => 'sjis' });

     # Convert between two arbitrary (but largely compatible) charset encodings
     # (SJIS to EUC-JP)
     my $utf8_string   = to_utf8({ -string =>$sjis_string, -charset => 'sjis'});
     my $euc_jp_string = from_utf8({ -string => $utf8_string, -charset => 'euc-jp' })

     # Verify that a specific character set is supported
     if (utf8_supported_charset('ISO-8859-1') {
         # Yes
     }

DESCRIPTION
    Provides an adapter layer between core routines for converting to and from UTF8 and other
    encodings. In essence, a way to give multiple existing Unicode modules a single common interface
    so you don't have to know the underlaying implementations to do simple UTF8 to-from other
    character set encoding conversions. As such, it wraps the Unicode::String, Unicode::Map8,
    Unicode::Map and Jcode modules in a standardized and simple API.

    This also provides general character set conversion operation based on UTF8 - it is possible to
    convert between any two compatible and supported character sets via a simple two step chaining
    of conversions.

    As with most things Perlish - if you give it a few big chunks of text to chew on instead of lots
    of small ones it will handle many more characters per second.

    By design, it can be easily extended to encompass any new charset encoding conversion modules
    that arrive on the scene.

    This module is intended to provide good Unicode support to versions of Perl prior to 5.8. If you
    are using Perl 5.8.0 or later, you probably want to be using the Encode module instead. This
    module does work with Perl 5.8, but Encode is the preferred method in that environment.

CHANGES
    1.14 2020.09.27 Fixing POD breakage in EUC-JP version of POD

    1.13 2020.09.27 Fixing MANIFEST.SKIP error

    1.12 2020.09.27 Build tool updates. Maintainer updates. POD error fixes. Relicensed under MIT
    license.

    1.11 2005.10.10 Documentation changes. Addition of Build.PL support. Added various build tests,
    LICENSE, Artistic_License.txt, GPL_License.txt. Split documentation into seperate .pod file.
    Added Japanese translation of POD.

    1.10 2005.05.22 - Fixed bug in conversion of ISO-2022-JP to UTF-8. Problem and fix found by
    Masahiro HONMA <masahiro.honma AT tsutaya.jp>.

                      Similar bugs in conversions of shift_jis and euc-jp
                      to UTF-8 corrected as well.

    1.09 2001.08.22 - Fixed multiple typo occurances of 'uft' where 'utf' was meant in code. Problem
    affected utf16 and utf7 encodings. Problem found by devon smith <devon AT taller.edu>

    1.08 2000.11.06 Added 'utf8_charset_alias' function to allow for runtime setting of character
    set aliases. Added several alternate names for 'sjis' (shiftjis, shift-jis, shift_jis, s-jis,
    and s_jis).

                    Corrected 'croak' messages for 'from_utf8' functions to
                    appropriate function name.

                    Corrected fatal problem in jcode-unicode internals. Problem
                    and fix found by Brian Wisti <wbrian2 AT uswest.net>.

    1.07 2000.11.01 Added 'croak' to use Carp declaration to fix error messages. Problem and fix
    found by <wbrian2 AT uswest.net>.

    1.06 2000.10.30 Fix to handle change in stringification of overloaded objects between Perl 5.005
    and 5.6. Problem noticed by Brian Wisti <wbrian2 AT uswest.net>.

    1.05 2000.10.23 Error in conversions from UTF8 to multibyte encodings corrected

    1.04 2000.10.23 Additional diagnostic error messages added for internal errors

    1.03 2000.10.22 Bug fix for load time Unicode::Map encoding detection

    1.02 2000.10.22 Bug fix to 'from_utf8' method and load time detection of Unicode::Map8 supported
    character set encodings

    1.01 2000.10.02 Initial public release

FUNCTIONS
    utf8_charset_alias({ $alias => $charset });
        Used for runtime assignment of character set aliases.

        Called with no parameters, returns a hash of defined aliases and the character sets they map
        to.

        Example:

          my $aliases     = utf8_charset_alias;
          my @alias_names = keys %$aliases;

        If called with ONE parameter, returns the name of the 'real' charset if the alias is
        defined. Returns undef if it is not found in the aliases.

        Example:

            if (! utf8_charset_alias('VISCII')) {
                # No alias for this
            }

        If called with a list of 'alias' => 'charset' pairs, defines those aliases for use.

        Example:

            utf8_charset_alias({ 'japanese' => 'sjis', 'japan' => 'sjis' });

        Note: It will croak if a passed pair does not map to a character set defined in the
        predefined set of character encoding. It is NOT allowed to alias something to another alias.

        Multiple character set aliases can be set with a single call.

        To clear an alias, pass a character set mapping of undef.

        Example:

            utf8_charset_alias({ 'japanese' => undef });

        While an alias is set, the 'utf8_supported_charset' function will return the alias as if it
        were a predefined charset.

        Overriding a base defined character encoding with an alias will generate a warning message
        to STDERR.

    utf8_supported_charset($charset_name);
        Returns true if the named charset is supported (including user defined aliases).

        Returns false if it is not.

        Example:

            if (! utf8_supported_charset('VISCII')) {
                # No support yet
            }

        If called in a list context with no parameters, it will return a list of all supported
        character set names (including user defined aliases).

        Example:

            my @charsets = utf8_supported_charset;

    to_utf8({ -string => $string, -charset => $source_charset });
        Returns the string converted to UTF8 from the specified source charset.

    from_utf8({ -string => $string, -charset => $target_charset});
        Returns the string converted from UTF8 to the specified target charset.

VERSION
    1.14 2020.09.27

TODO
    Regression tests for Jcode, 2-byte encodings and encoding aliases

SEE ALSO
    Unicode::String Unicode::Map8 Unicode::Map Jcode Encode

COPYRIGHT
    Copyright 2000-2020, Jerilyn Franz. All rights reserved.

AUTHOR
    Jerilyn Franz <cpan AT jerilyn.info>

LICENSE
    MIT License

    Copyright (c) 2020 Jerilyn Franz

    Permission is hereby granted, free of charge, to any person obtaining a copy of this software
    and associated documentation files (the "Software"), to deal in the Software without
    restriction, including without limitation the rights to use, copy, modify, merge, publish,
    distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the
    Software is furnished to do so, subject to the following conditions:

    The above copyright notice and this permission notice shall be included in all copies or
    substantial portions of the Software.

    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
    BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
    DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Unicode::MapUTF8(3pm)
NAME SYNOPSIS DESCRIPTION CHANGES FUNCTIONS
utf8_charset_alias({ $alias => $charset }); utf8_supported_charset($charset_name); to_utf8({ -string => $string, -charset => $source_charset }); from_utf8({ -string => $string, -charset => $target_charset});
VERSION TODO SEE ALSO COPYRIGHT AUTHOR LICENSE

Generated by phpman v3.7.12 Author: Che Dong Under GNU General Public License
2026-06-13 14:44 @216.73.216.28
CrawledBy Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
Valid XHTML 1.0 TransitionalValid CSS!

^_back to top