phpman > perldoc > CHAR(4)

Che Dong
Found in /usr/share/perl/5.34/pod/perlfaq4.pod
  How do I remove consecutive pairs of characters?
    (contributed by brian d foy)

    You can use the substitution operator to find pairs of characters (or
    runs of characters) and replace them with a single instance. In this
    substitution, we find a character in "(.)". The memory parentheses store
    the matched character in the back-reference "\g1" and we use that to
    require that the same thing immediately follow it. We replace that part
    of the string with the character in $1.

        s/(.)\g1/$1/g;

    We can also use the transliteration operator, "tr///". In this example,
    the search list side of our "tr///" contains nothing, but the "c" option
    complements that so it contains everything. The replacement list also
    contains nothing, so the transliteration is almost a no-op since it
    won't do any replacements (or more exactly, replace the character with
    itself). However, the "s" option squashes duplicated and consecutive
    characters in the string so a character does not show up next to itself

        my $str = 'Haarlem';   # in the Netherlands
        $str =~ tr///cs;       # Now Harlem, like in New York

  How can I access or change N characters of a string?
    You can access the first characters of a string with substr(). To get
    the first character, for example, start at position 0 and grab the
    string of length 1.

        my $string = "Just another Perl Hacker";
        my $first_char = substr( $string, 0, 1 );  #  'J'

    To change part of a string, you can use the optional fourth argument
    which is the replacement string.

        substr( $string, 13, 4, "Perl 5.8.0" );

    You can also use substr() as an lvalue.

        substr( $string, 13, 4 ) =  "Perl 5.8.0";

  How can I split a [character]-delimited string except when inside [character]?
    Several modules can handle this sort of parsing--Text::Balanced,
    Text::CSV, Text::CSV_XS, and Text::ParseWords, among others.

    Take the example case of trying to split a string that is
    comma-separated into its different fields. You can't use "split(/,/)"
    because you shouldn't split if the comma is inside quotes. For example,
    take a data line like this:

        SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"

    Due to the restriction of the quotes, this is a fairly complex problem.
    Thankfully, we have Jeffrey Friedl, author of *Mastering Regular
    Expressions*, to handle these for us. He suggests (assuming your string
    is contained in $text):

         my @new = ();
         push(@new, $+) while $text =~ m{
             "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
            | ([^,]+),?
            | ,
         }gx;
         push(@new, undef) if substr($text,-1,1) eq ',';

    If you want to represent quotation marks inside a
    quotation-mark-delimited field, escape them with backslashes (eg, "like
    \"this\"".

    Alternatively, the Text::ParseWords module (part of the standard Perl
    distribution) lets you say:

        use Text::ParseWords;
        @new = quotewords(",", 0, $text);

    For parsing or generating CSV, though, using Text::CSV rather than
    implementing it yourself is highly recommended; you'll save yourself odd
    bugs popping up later by just using code which has already been tried
    and tested in production for years.

Found in /usr/share/perl/5.34/pod/perlfaq5.pod
  How can I read a single character from a file? From the keyboard?
    You can use the builtin "getc()" function for most filehandles, but it
    won't (easily) work on a terminal device. For STDIN, either use the
    Term::ReadKey module from CPAN or use the sample code in "getc" in
    perlfunc.

    If your system supports the portable operating system programming
    interface (POSIX), you can use the following code, which you'll note
    turns off echo processing as well.

        #!/usr/bin/perl -w
        use strict;
        $| = 1;
        for (1..4) {
            print "gimme: ";
            my $got = getone();
            print "--> $got\n";
        }
        exit;

        BEGIN {
            use POSIX qw(:termios_h);

            my ($term, $oterm, $echo, $noecho, $fd_stdin);

            my $fd_stdin = fileno(STDIN);

            $term     = POSIX::Termios->new();
            $term->getattr($fd_stdin);
            $oterm     = $term->getlflag();

            $echo     = ECHO | ECHOK | ICANON;
            $noecho   = $oterm & ~$echo;

            sub cbreak {
                $term->setlflag($noecho);
                $term->setcc(VTIME, 1);
                $term->setattr($fd_stdin, TCSANOW);
            }

            sub cooked {
                $term->setlflag($oterm);
                $term->setcc(VTIME, 0);
                $term->setattr($fd_stdin, TCSANOW);
            }

            sub getone {
                my $key = '';
                cbreak();
                sysread(STDIN, $key, 1);
                cooked();
                return $key;
            }
        }

        END { cooked() }

    The Term::ReadKey module from CPAN may be easier to use. Recent versions
    include also support for non-portable systems as well.

        use Term::ReadKey;
        open my $tty, '<', '/dev/tty';
        print "Gimme a char: ";
        ReadMode "raw";
        my $key = ReadKey 0, $tty;
        ReadMode "normal";
        printf "\nYou said %s, char number %03d\n",
            $key, ord $key;

  How can I tell whether there's a character waiting on a filehandle?
    The very first thing you should do is look into getting the
    Term::ReadKey extension from CPAN. As we mentioned earlier, it now even
    has limited support for non-portable (read: not open systems, closed,
    proprietary, not POSIX, not Unix, etc.) systems.

    You should also check out the Frequently Asked Questions list in
    comp.unix.* for things like this: the answer is essentially the same.
    It's very system-dependent. Here's one solution that works on BSD
    systems:

        sub key_ready {
            my($rin, $nfd);
            vec($rin, fileno(STDIN), 1) = 1;
            return $nfd = select($rin,undef,undef,0);
        }

    If you want to find out how many characters are waiting, there's also
    the FIONREAD ioctl call to be looked at. The *h2ph* tool that comes with
    Perl tries to convert C include files to Perl code, which can be
    "require"d. FIONREAD ends up defined as a function in the *sys/ioctl.ph*
    file:

        require './sys/ioctl.ph';

        $size = pack("L", 0);
        ioctl(FH, FIONREAD(), $size)    or die "Couldn't call ioctl: $!\n";
        $size = unpack("L", $size);

    If *h2ph* wasn't installed or doesn't work for you, you can *grep* the
    include files by hand:

        % grep FIONREAD /usr/include/*/*
        /usr/include/asm/ioctls.h:#define FIONREAD      0x541B

    Or write a small C program using the editor of champions:

        % cat > fionread.c
        #include <sys/ioctl.h>
        main() {
            printf("%#08x\n", FIONREAD);
        }
        ^D
        % cc -o fionread fionread.c
        % ./fionread
        0x4004667f

    And then hard-code it, leaving porting as an exercise to your successor.

        $FIONREAD = 0x4004667f;         # XXX: opsys dependent

        $size = pack("L", 0);
        ioctl(FH, $FIONREAD, $size)     or die "Couldn't call ioctl: $!\n";
        $size = unpack("L", $size);

    FIONREAD requires a filehandle connected to a stream, meaning that
    sockets, pipes, and tty devices work, but *not* files.

Found in /usr/share/perl/5.34/pod/perlfaq6.pod
  How can I make "\w" match national character sets?
    Put "use locale;" in your script. The \w character class is taken from
    the current locale.

    See perllocale for details.

  How can I match strings with multibyte characters?
    Starting from Perl 5.6 Perl has had some level of multibyte character
    support. Perl 5.8 or later is recommended. Supported multibyte character
    repertoires include Unicode, and legacy encodings through the Encode
    module. See perluniintro, perlunicode, and Encode.

    If you are stuck with older Perls, you can do Unicode with the
    Unicode::String module, and character conversions using the
    Unicode::Map8 and Unicode::Map modules. If you are using Japanese
    encodings, you might try using the jperl 5.005_03.

    Finally, the following set of approaches was offered by Jeffrey Friedl,
    whose article in issue #5 of The Perl Journal talks about this very
    matter.

    Let's suppose you have some weird Martian encoding where pairs of ASCII
    uppercase letters encode single Martian letters (i.e. the two bytes "CV"
    make a single Martian letter, as do the two bytes "SG", "VS", "XX",
    etc.). Other bytes represent single characters, just like ASCII.

    So, the string of Martian "I am CVSGXX!" uses 12 bytes to encode the
    nine characters 'I', ' ', 'a', 'm', ' ', 'CV', 'SG', 'XX', '!'.

    Now, say you want to search for the single character "/GX/". Perl
    doesn't know about Martian, so it'll find the two bytes "GX" in the "I
    am CVSGXX!" string, even though that character isn't there: it just
    looks like it is because "SG" is next to "XX", but there's no real "GX".
    This is a big problem.

    Here are a few ways, all painful, to deal with it:

        # Make sure adjacent "martian" bytes are no longer adjacent.
        $martian =~ s/([A-Z][A-Z])/ $1 /g;

        print "found GX!\n" if $martian =~ /GX/;

    Or like this:

        my @chars = $martian =~ m/([A-Z][A-Z]|[^A-Z])/g;
        # above is conceptually similar to:     my @chars = $text =~ m/(.)/g;
        #
        foreach my $char (@chars) {
            print "found GX!\n", last if $char eq 'GX';
        }

    Or like this:

        while ($martian =~ m/\G([A-Z][A-Z]|.)/gs) {  # \G probably unneeded
            if ($1 eq 'GX') {
                print "found GX!\n";
                last;
            }
        }

    Here's another, slightly less painful, way to do it from Benjamin
    Goldberg, who uses a zero-width negative look-behind assertion.

        print "found GX!\n" if    $martian =~ m/
            (?<![A-Z])
            (?:[A-Z][A-Z])*?
            GX
            /x;

    This succeeds if the "martian" character GX is in the string, and fails
    otherwise. If you don't like using (?<!), a zero-width negative
    look-behind assertion, you can replace (?<![A-Z]) with (?:^|[^A-Z]).

    It does have the drawback of putting the wrong thing in $-[0] and $+[0],
    but this usually can be worked around.

Found in /usr/share/perl/5.34/pod/perlfaq8.pod
  How do I trap control characters/signals?
    You don't actually "trap" a control character. Instead, that character
    generates a signal which is sent to your terminal's currently
    foregrounded process group, which you then trap in your process. Signals
    are documented in "Signals" in perlipc and the section on "Signals" in
    the Camel.

    You can set the values of the %SIG hash to be the functions you want to
    handle the signal. After perl catches the signal, it looks in %SIG for a
    key with the same name as the signal, then calls the subroutine value
    for that key.

        # as an anonymous subroutine

        $SIG{INT} = sub { syswrite(STDERR, "ouch\n", 5 ) };

        # or a reference to a function

        $SIG{INT} = \&ouch;

        # or the name of the function as a string

        $SIG{INT} = "ouch";

    Perl versions before 5.8 had in its C source code signal handlers which
    would catch the signal and possibly run a Perl function that you had set
    in %SIG. This violated the rules of signal handling at that level
    causing perl to dump core. Since version 5.8.0, perl looks at %SIG after
    the signal has been caught, rather than while it is being caught.
    Previous versions of this answer were incorrect.
Generated by phpMan v3.7.7 Author: Che Dong Under GNU General Public License
2026-06-10 09:10 @216.73.217.62
CrawledBy Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
phpMan > perldoc > CHAR(4)