NAME
Jcode - Japanese Charset Handler
SYNOPSIS
use Jcode;
#
# traditional
Jcode::convert(\$str, $ocode, $icode, "z");
# or OOP!
print Jcode->new($str)->h2z->tr($from, $to)->utf8;
DESCRIPTION
<Japanese document is now available as Jcode::Nihongo. >
Jcode.pm supports both object and traditional approach. With object
approach, you can go like;
$iso_2022_jp = Jcode->new($str)->h2z->jis;
Which is more elegant than:
$iso_2022_jp = $str;
&jcode::convert(\$iso_2022_jp, 'jis', &jcode::getcode(\$str), "z");
For those unfamiliar with objects, Jcode.pm still supports "getcode()"
and "convert()."
If the perl version is 5.8.1, Jcode acts as a wrapper to Encode, the
standard charset handler module for Perl 5.8 or later.
Methods
Methods mentioned here all return Jcode object unless otherwise
mentioned.
Constructors
$j = Jcode->new($str [, $icode])
Creates Jcode object $j from $str. Input code is automatically checked
unless you explicitly set $icode. For available charset, see getcode
below.
For perl 5.8.1 or better, $icode can be *any encoding name* that
Encode understands.
$j = Jcode->new($european, 'iso-latin1');
When the object is stringified, it returns the EUC-converted string so
you can <print $j> instead of <print $j->euc>.
Passing Reference
Instead of scalar value, You can use reference as
Jcode->new(\$str);
This saves time a little bit. In exchange of the value of $str being
converted. (In a way, $str is now "tied" to jcode object).
$j->set($str [, $icode])
Sets $j's internal string to $str. Handy when you use Jcode object
repeatedly (saves time and memory to create object).
# converts mailbox to SJIS format
my $jconv = new Jcode;
$/ = 00;
while(<>){
print $jconv->set(\$_)->mime_decode->sjis;
}
$j->append($str [, $icode]);
Appends $str to $j's internal string.
$j = jcode($str [, $icode]);
shortcut for Jcode->new() so you can go like;
Encoded Strings
In general, you can retrieve *encoded* string as $j->*encoded*.
$sjis = jcode($str)->sjis
$euc = $j->euc
$jis = $j->jis
$sjis = $j->sjis
$ucs2 = $j->ucs2
$utf8 = $j->utf8
What you code is what you get :)
$iso_2022_jp = $j->iso_2022_jp
Same as "$j->h2z->jis". Hankaku Kanas are forcibly converted to
Zenkaku.
For perl 5.8.1 and better, you can also use any encoding names and
aliases that Encode supports. For example:
$european = $j->iso_latin1; # replace '-' with '_' for names.
FYI: Encode::Encoder uses similar trick.
$j->fallback($fallback)
For perl is 5.8.1 or better, Jcode stores the internal string in
UTF-8. Any character that does not map to *->encoding* are replaced
with a '?', which is Encode standard.
my $unistr = "\x{262f}"; # YIN YANG
my $j = jcode($unistr); # $j->euc is '?'
You can change this behavior by specifying fallback like Encode.
Values are the same as Encode. "Jcode::FB_PERLQQ",
"Jcode::FB_XMLCREF", "Jcode::FB_HTMLCREF" are aliased to those of
Encode for convenice.
print $j->fallback(Jcode::FB_PERLQQ)->euc; # '\x{262f}'
print $j->fallback(Jcode::FB_XMLCREF)->euc; # '☯'
print $j->fallback(Jcode::FB_HTMLCREF)->euc; # '☯'
The global variable $Jcode::FALLBACK stores the default fallback so
you can override that by assigning the value.
$Jcode::FALLBACK = Jcode::FB_PERLQQ; # set default fallback scheme
[@lines =] $jcode->jfold([$width, $newline_str, $kref])
folds lines in jcode string every $width (default: 72) where $width is
the number of "halfwidth" character. Fullwidth Characters are counted
as two.
with a newline string spefied by $newline_str (default: "\n").
Rudimentary kinsoku suppport is now available for Perl 5.8.1 and
better.
$length = $jcode->jlength();
returns character length properly, rather than byte length.
Methods that use MIME::Base64
To use methods below, you need MIME::Base64. To install, simply
perl -MCPAN -e 'CPAN::Shell->install("MIME::Base64")'
If your perl is 5.6 or better, there is no need since MIME::Base64 is
bundled.
$mime_header = $j->mime_encode([$lf, $bpl])
Converts $str to MIME-Header documented in RFC1522. When $lf is
specified, it uses $lf to fold line (default: \n). When $bpl is
specified, it uses $bpl for the number of bytes (default: 76; this
number must be smaller than 76).
For Perl 5.8.1 or better, you can also encode MIME Header as:
$mime_header = $j->MIME_Header;
In which case the resulting $mime_header is MIME-B-encoded UTF-8
whereas "$j->mime_encode()" returnes MIME-B-encoded ISO-2022-JP. Most
modern MUAs support both.
$j->mime_decode;
Decodes MIME-Header in Jcode object. For perl 5.8.1 or better, you can
also do the same as:
Jcode->new($str, 'MIME-Header')
Hankaku vs. Zenkaku
$j->h2z([$keep_dakuten])
Converts X201 kana (Hankaku) to X208 kana (Zenkaku). When
$keep_dakuten is set, it leaves dakuten as is (That is, "ka + dakuten"
is left as is instead of being converted to "ga")
You can retrieve the number of matches via $j->nmatch;
$j->z2h
Converts X208 kana (Zenkaku) to X201 kana (Hankaku).
You can retrieve the number of matches via $j->nmatch;
Regexp emulators
To use "->m()" and "->s()", you need perl 5.8.1 or better.
$j->tr($from, $to, $opt);
Applies "tr/$from/$to/" on Jcode object where $from and $to are EUC-JP
strings. On perl 5.8.1 or better, $from and $to can also be flagged
UTF-8 strings.
If $opt is set, "tr/$from/$to/$opt" is applied. $opt must be 'c', 'd'
or the combination thereof.
You can retrieve the number of matches via $j->nmatch;
The following methods are available only for perl 5.8.1 or better.
$j->s($patter, $replace, $opt);
Applies "s/$pattern/$replace/$opt". $pattern and "replace" must be in
EUC-JP or flagged UTF-8. $opt are the same as regexp options. See
perlre for regexp options.
Like "$j->tr()", "$j->s()" returns the object itself so you can nest
the operation as follows;
$j->tr("a-z", "A-Z")->s("foo", "bar");
[@match = ] $j->m($pattern, $opt);
Applies "m/$patter/$opt". Note that this method DOES NOT RETURN AN
OBJECT so you can't chain the method like "$j->s()".
Instance Variables
If you need to access instance variables of Jcode object, use access
methods below instead of directly accessing them (That's what OOP is all
about)
FYI, Jcode uses a ref to array instead of ref to hash (common way) to
optimize speed (Actually you don't have to know as long as you use
access methods instead; Once again, that's OOP)
$j->r_str
Reference to the EUC-coded String.
$j->icode
Input charcode in recent operation.
$j->nmatch
Number of matches (Used in $j->tr, etc.)
Subroutines
($code, [$nmatch]) = getcode($str)
Returns char code of $str. Return codes are as follows
ascii Ascii (Contains no Japanese Code)
binary Binary (Not Text File)
euc EUC-JP
sjis SHIFT_JIS
jis JIS (ISO-2022-JP)
ucs2 UCS2 (Raw Unicode)
utf8 UTF8
When array context is used instead of scaler, it also returns how many
character codes are found. As mentioned above, $str can be \$str
instead.
jcode.pl Users: This function is 100% upper-conpatible with
jcode::getcode() -- well, almost;
* When its return value is an array, the order is the opposite;
jcode::getcode() returns $nmatch first.
* jcode::getcode() returns 'undef' when the number of EUC characters
is equal to that of SJIS. Jcode::getcode() returns EUC. for
Jcode.pm there is no in-betweens.
Jcode::convert($str, [$ocode, $icode, $opt])
Converts $str to char code specified by $ocode. When $icode is
specified also, it assumes $icode for input string instead of the one
checked by getcode(). As mentioned above, $str can be \$str instead.
jcode.pl Users: This function is 100% upper-conpatible with
jcode::convert() !
BUGS
For perl is 5.8.1 or later, Jcode acts as a wrapper to Encode. Meaning
Jcode is subject to bugs therein.
ACKNOWLEDGEMENTS
This package owes a lot in motivation, design, and code, to the jcode.pl
for Perl4 by Kazumasa Utashiro <utashiro AT iij.jp>.
Hiroki Ohzaki <ohzaki AT iod.jp> has helped me polish regexp from
the very first stage of development.
JEncode by makamaka AT donzoko.net has inspired me to integrate Encode to
Jcode. He has also contributed Japanese POD.
And folks at Jcode Mailing list <jcode5 AT ring.jp>. Without them, I
couldn't have coded this far.
SEE ALSO
Encode
Jcode::Nihongo
<http://www.iana.org/assignments/character-sets>
COPYRIGHT
Copyright 1999-2005 Dan Kogai <dankogai AT dan.jp>
This library is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
Generated by phpMan Author: Che Dong On Apache Under GNU General Public License - MarkDown Format
2026-05-21 22:08 @216.73.216.105 CrawledBy Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)