man > Encode

🟒 NAME

Encode - character encodings in Perl

πŸš€ Quick Reference

Use CaseCommandDescription
πŸ”€ Encode string to octetsencode(ENCODING, STRING [, CHECK])Convert Perl string to a byte sequence in given encoding
πŸ”“ Decode octets to stringdecode(ENCODING, OCTETS [, CHECK])Convert byte sequence to Perl's internal string
πŸ” Find encoding objectfind_encoding(ENCODING)Return encoding object for reuse (e.g., $obj->decode())
πŸ” Convert between encodingsfrom_to($octets, FROM_ENC, TO_ENC [, CHECK])In‑place conversion of octet data
πŸ“‚ File I/O with encodingopen(FH, "< :encoding(ENCODING)", $file)Automatically decode/encode via PerlIO layer
πŸ“‹ List loaded encodingsEncode->encodings()Get list of canonical names of loaded encodings
🌐 List all encodingsEncode->encodings(":all")Get all available encodings (including unloaded)

πŸ“– SYNOPSIS

use Encode qw(decode encode);
$characters = decode('UTF-8', $octets,     Encode::FB_CROAK);
$octets     = encode('UTF-8', $characters, Encode::FB_CROAK);

πŸ“– Table of Contents

Encode consists of a collection of modules whose details are too extensive to fit in one document. This one itself explains the top-level APIs and general topics at a glance. For other topics and more details, see the documentation for these modules:

πŸ“– DESCRIPTION

The Encode module provides the interface between Perl strings and the rest of the system. Perl strings are sequences of characters.

The repertoire of characters that Perl can represent is a superset of those defined by the Unicode Consortium. On most platforms the ordinal values of a character as returned by ord(S)" is the Unicode codepoint for that character. The exceptions are platforms where the legacy encoding is some variant of EBCDIC rather than a superset of ASCII; see perlebcdic.

During recent history, data is moved around a computer in 8-bit chunks, often called "bytes" but also known as "octets" in standards documents. Perl is widely used to manipulate data of many types: not only strings of characters representing human or computer languages, but also "binary" data, being the machine's representation of numbers, pixels in an image, or just about anything.

When Perl is processing "binary data", the programmer wants Perl to process "sequences of bytes". This is not a problem for Perl: because a byte has 256 possible values, it easily fits in Perl's much larger "logical character".

This document mostly explains the how. perlunitut and perlunifaq explain the why.

πŸ“– TERMINOLOGY

βš™οΈ THE PERL ENCODING API

πŸ”§ Basic methods

πŸ“‹ Listing available encodings

See Encode::Supported for details.

πŸ”— Defining Aliases

use Encode;
use Encode::Alias;
define_alias(NEWNAME => ENCODING);

After that, NEWNAME can be used as alias for ENCODING (name or encoding object). Check alias existence with resolve_alias:

Encode::resolve_alias("latin1") eq "iso-8859-1" # true
Encode::resolve_alias("iso-8859-12")   # false; nonexistent
Encode::resolve_alias($name) eq $name  # true if $name is canonical

resolve_alias can be imported via use Encode qw(resolve_alias). See Encode::Alias.

🌐 Finding IANA Character Set Registry names

Canonical names may not match IANA registry names (e.g., "utf-8-strict" vs "UTF-8"). Method mime_name() returns the proper IANA name:

use Encode;
my $enc = find_encoding("UTF-8");
warn $enc->name;      # utf-8-strict
warn $enc->mime_name; # UTF-8

See Encode::Encoding.

πŸ“‚ Encoding via PerlIO

Use :encoding(ENC) layer on filehandles for automatic encode/decode:

### Version 1 via PerlIO
open(INPUT,  "< :encoding(shiftjis)", $infile)
    || die "Can't open < $infile for reading: $!";
open(OUTPUT, "> :encoding(euc-jp)",  $outfile)
    || die "Can't open > $output for writing: $!";
while (<INPUT>) {   # auto decodes $_
    print OUTPUT;   # auto encodes $_
}

### Version 2 via from_to()
open(INPUT,  "< :raw", $infile) || die ...;
open(OUTPUT, "> :raw",  $outfile) || die ...;
while (<INPUT>) {
    from_to($_, "shiftjis", "euc-jp", 1);
    print OUTPUT;
}

Check if encoding supports PerlIO with perlio_ok:

Encode::perlio_ok("hz");             # false
find_encoding("euc-cn")->perlio_ok;  # true (where available)
use Encode qw(perlio_ok);            # imported upon request
perlio_ok("euc-jp")

All core encodings except "hz" and "ISO-2022-kr" are PerlIO-savvy. See Encode::Encoding and Encode::PerlIO.

⚠️ Handling Malformed Data

The optional CHECK argument controls behavior on malformed data. Default is Encode::FB_DEFAULT (== 0). As of version 2.12, coderef values are supported. Not all encodings support this; e.g., Encode::Unicode always croaks.

πŸ“‹ List of CHECK values

ConstantValueBehavior
FB_DEFAULT0Replace malformed character with substitution character (SUBCHAR on encode, U+FFFD on decode). Warns if UTF-8.
FB_CROAK1Die immediately with error message.
FB_QUIETbitmaskReturn processed portion on error; unprocessed data remains in argument.
FB_WARNbitmaskSame as FB_QUIET but issues warning. Warnings are independent of pragma warnings; use ENCODE::ONLY_PRAGMA_WARNINGS to follow lexical warnings (since 2.99).
FB_PERLQQbitmaskInsert \xHH on decode, \x{HHHH} on encode.
FB_HTMLCREFbitmaskInsert &#NNN; (decimal) on encode.
FB_XMLCREFbitmaskInsert &#xHHHH; (hex) on encode.

Bitmask breakdown:

FlagHexFB_DEFAULTFB_CROAKFB_QUIETFB_WARNFB_PERLQQ
DIE_ON_ERR0x0001X
WARN_ON_ERR0x0002X
RETURN_ON_ERR0x0004XX
LEAVE_SRC0x0008X
PERLQQ0x0100X
HTMLCREF0x0200
XMLCREF0x0400

LEAVE_SRC: If not set, source string to encode() or decode() is overwritten. Bitwise-OR to preserve input.

πŸ”§ Coderef for CHECK

As of version 2.12, CHECK can be a coderef. For encode: receives ordinal of unmapped character, returns octets for fallback.

$ascii = encode("ascii", $utf8, sub{ sprintf "<U+%04X>", shift });

For decode: receives list of ordinal values, returns decoded string.

$str = decode 'UTF-8', $octets, sub {
    my $tmp = join '', map chr, @_;
    return decode 'ISO-8859-15', $tmp;
};

πŸ› οΈ Defining Encodings

use Encode qw(define_encoding);
define_encoding($object, CANONICAL_NAME [, alias...]);

Associates $object with canonical name and optional aliases. See Encode::Encoding.

🚩 The UTF8 flag

Before Perl 5.8, eq compared strings directly. Since 5.8, eq considers the UTF8 flag. Quoting Programming Perl, 3rd ed.:

The UTF8 flag is not visible in scripts; you can peek with internal functions (see below).

πŸ”§ Messing with Perl's Internals

🧩 UTF-8 vs. utf8 vs. UTF8

Historically, Perl used a loose interpretation of UTF-8 (allowing 32-bit and surrogates). Official UTF-8 is stricter (0..0x10_FFFF, no surrogates, no non-shortest encodings). As of Perl 5.8.7 and Encode 2.10:

Examples:

encode("utf8",  "\x{FFFF_FFFF}", 1); # okay (loose)
encode("UTF-8", "\x{FFFF_FFFF}", 1); # croaks (strict)
find_encoding("UTF-8")->name # 'utf-8-strict'
find_encoding("utf-8")->name # ditto (case/underscore insensitive)
find_encoding("UTF8")->name  # 'utf8'

πŸ“š SEE ALSO

Encode::Encoding, Encode::Supported, Encode::PerlIO, encoding, perlebcdic, "open" in perlfunc, perlunicode, perluniintro, perlunifaq, perlunitut, utf8, the Perl Unicode Mailing List <http://lists.perl.org/list/perl-unicode.html>

πŸ‘€ MAINTAINER

This project was originated by the late Nick Ing-Simmons and later maintained by Dan Kogai <dankogai@cpan.org>. See AUTHORS for a full list of people involved. For any questions, send mail to <perl-unicode@perl.org> so that we can all share.

While Dan Kogai retains the copyright as a maintainer, credit should go to all those involved. See AUTHORS for a list of those who submitted code to the project.

©️ COPYRIGHT

Copyright 2002-2014 Dan Kogai <dankogai@cpan.org>.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Encode
🟒 NAME πŸš€ Quick Reference πŸ“– SYNOPSIS
πŸ“– Table of Contents
πŸ“– DESCRIPTION
πŸ“– TERMINOLOGY
βš™οΈ THE PERL ENCODING API
πŸ”§ Basic methods πŸ“‹ Listing available encodings πŸ”— Defining Aliases 🌐 Finding IANA Character Set Registry names πŸ“‚ Encoding via PerlIO ⚠️ Handling Malformed Data πŸ› οΈ Defining Encodings 🚩 The UTF8 flag 🧩 UTF-8 vs. utf8 vs. UTF8
πŸ“š SEE ALSO πŸ‘€ MAINTAINER ©️ COPYRIGHT

Generated by phpman v4.9.22-1-g1b0fcb4 · Markdown · JSON · MCP Author: Che Dong Under GNU General Public License
2026-07-05 16:48 @216.73.216.52
CrawledBy Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
Valid XHTML 1.0 Transitional!Valid CSS!
Enhanced by LLM: deepseek-v4-pro / taotoken.net / www.chedong.com - original format

^_top_^