Found in /usr/share/perl/5.34/pod/perlfaq4.pod How do I remove consecutive pairs of characters? (contributed by brian d foy) You can use the substitution operator to find pairs of characters (or runs of characters) and replace them with a single instance. In this substitution, we find a character in "(.)". The memory parentheses store the matched character in the back-reference "\g1" and we use that to require that the same thing immediately follow it. We replace that part of the string with the character in $1. s/(.)\g1/$1/g; We can also use the transliteration operator, "tr///". In this example, the search list side of our "tr///" contains nothing, but the "c" option complements that so it contains everything. The replacement list also contains nothing, so the transliteration is almost a no-op since it won't do any replacements (or more exactly, replace the character with itself). However, the "s" option squashes duplicated and consecutive characters in the string so a character does not show up next to itself my $str = 'Haarlem'; # in the Netherlands $str =~ tr///cs; # Now Harlem, like in New York How can I access or change N characters of a string? You can access the first characters of a string with substr(). To get the first character, for example, start at position 0 and grab the string of length 1. my $string = "Just another Perl Hacker"; my $first_char = substr( $string, 0, 1 ); # 'J' To change part of a string, you can use the optional fourth argument which is the replacement string. substr( $string, 13, 4, "Perl 5.8.0" ); You can also use substr() as an lvalue. substr( $string, 13, 4 ) = "Perl 5.8.0"; How can I split a [character]-delimited string except when inside [character]? Several modules can handle this sort of parsing--Text::Balanced, Text::CSV, Text::CSV_XS, and Text::ParseWords, among others. Take the example case of trying to split a string that is comma-separated into its different fields. You can't use "split(/,/)" because you shouldn't split if the comma is inside quotes. For example, take a data line like this: SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped" Due to the restriction of the quotes, this is a fairly complex problem. Thankfully, we have Jeffrey Friedl, author of *Mastering Regular Expressions*, to handle these for us. He suggests (assuming your string is contained in $text): my @new = (); push(@new, $+) while $text =~ m{ "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes | ([^,]+),? | , }gx; push(@new, undef) if substr($text,-1,1) eq ','; If you want to represent quotation marks inside a quotation-mark-delimited field, escape them with backslashes (eg, "like \"this\"". Alternatively, the Text::ParseWords module (part of the standard Perl distribution) lets you say: use Text::ParseWords; @new = quotewords(",", 0, $text); For parsing or generating CSV, though, using Text::CSV rather than implementing it yourself is highly recommended; you'll save yourself odd bugs popping up later by just using code which has already been tried and tested in production for years. Found in /usr/share/perl/5.34/pod/perlfaq5.pod How can I read a single character from a file? From the keyboard? You can use the builtin "getc()" function for most filehandles, but it won't (easily) work on a terminal device. For STDIN, either use the Term::ReadKey module from CPAN or use the sample code in "getc" in perlfunc. If your system supports the portable operating system programming interface (POSIX), you can use the following code, which you'll note turns off echo processing as well. #!/usr/bin/perl -w use strict; $| = 1; for (1..4) { print "gimme: "; my $got = getone(); print "--> $got\n"; } exit; BEGIN { use POSIX qw(:termios_h); my ($term, $oterm, $echo, $noecho, $fd_stdin); my $fd_stdin = fileno(STDIN); $term = POSIX::Termios->new(); $term->getattr($fd_stdin); $oterm = $term->getlflag(); $echo = ECHO | ECHOK | ICANON; $noecho = $oterm & ~$echo; sub cbreak { $term->setlflag($noecho); $term->setcc(VTIME, 1); $term->setattr($fd_stdin, TCSANOW); } sub cooked { $term->setlflag($oterm); $term->setcc(VTIME, 0); $term->setattr($fd_stdin, TCSANOW); } sub getone { my $key = ''; cbreak(); sysread(STDIN, $key, 1); cooked(); return $key; } } END { cooked() } The Term::ReadKey module from CPAN may be easier to use. Recent versions include also support for non-portable systems as well. use Term::ReadKey; open my $tty, '<', '/dev/tty'; print "Gimme a char: "; ReadMode "raw"; my $key = ReadKey 0, $tty; ReadMode "normal"; printf "\nYou said %s, char number %03d\n", $key, ord $key; How can I tell whether there's a character waiting on a filehandle? The very first thing you should do is look into getting the Term::ReadKey extension from CPAN. As we mentioned earlier, it now even has limited support for non-portable (read: not open systems, closed, proprietary, not POSIX, not Unix, etc.) systems. You should also check out the Frequently Asked Questions list in comp.unix.* for things like this: the answer is essentially the same. It's very system-dependent. Here's one solution that works on BSD systems: sub key_ready { my($rin, $nfd); vec($rin, fileno(STDIN), 1) = 1; return $nfd = select($rin,undef,undef,0); } If you want to find out how many characters are waiting, there's also the FIONREAD ioctl call to be looked at. The *h2ph* tool that comes with Perl tries to convert C include files to Perl code, which can be "require"d. FIONREAD ends up defined as a function in the *sys/ioctl.ph* file: require './sys/ioctl.ph'; $size = pack("L", 0); ioctl(FH, FIONREAD(), $size) or die "Couldn't call ioctl: $!\n"; $size = unpack("L", $size); If *h2ph* wasn't installed or doesn't work for you, you can *grep* the include files by hand: % grep FIONREAD /usr/include/*/* /usr/include/asm/ioctls.h:#define FIONREAD 0x541B Or write a small C program using the editor of champions: % cat > fionread.c #include <sys/ioctl.h> main() { printf("%#08x\n", FIONREAD); } ^D % cc -o fionread fionread.c % ./fionread 0x4004667f And then hard-code it, leaving porting as an exercise to your successor. $FIONREAD = 0x4004667f; # XXX: opsys dependent $size = pack("L", 0); ioctl(FH, $FIONREAD, $size) or die "Couldn't call ioctl: $!\n"; $size = unpack("L", $size); FIONREAD requires a filehandle connected to a stream, meaning that sockets, pipes, and tty devices work, but *not* files. Found in /usr/share/perl/5.34/pod/perlfaq6.pod How can I make "\w" match national character sets? Put "use locale;" in your script. The \w character class is taken from the current locale. See perllocale for details. How can I match strings with multibyte characters? Starting from Perl 5.6 Perl has had some level of multibyte character support. Perl 5.8 or later is recommended. Supported multibyte character repertoires include Unicode, and legacy encodings through the Encode module. See perluniintro, perlunicode, and Encode. If you are stuck with older Perls, you can do Unicode with the Unicode::String module, and character conversions using the Unicode::Map8 and Unicode::Map modules. If you are using Japanese encodings, you might try using the jperl 5.005_03. Finally, the following set of approaches was offered by Jeffrey Friedl, whose article in issue #5 of The Perl Journal talks about this very matter. Let's suppose you have some weird Martian encoding where pairs of ASCII uppercase letters encode single Martian letters (i.e. the two bytes "CV" make a single Martian letter, as do the two bytes "SG", "VS", "XX", etc.). Other bytes represent single characters, just like ASCII. So, the string of Martian "I am CVSGXX!" uses 12 bytes to encode the nine characters 'I', ' ', 'a', 'm', ' ', 'CV', 'SG', 'XX', '!'. Now, say you want to search for the single character "/GX/". Perl doesn't know about Martian, so it'll find the two bytes "GX" in the "I am CVSGXX!" string, even though that character isn't there: it just looks like it is because "SG" is next to "XX", but there's no real "GX". This is a big problem. Here are a few ways, all painful, to deal with it: # Make sure adjacent "martian" bytes are no longer adjacent. $martian =~ s/([A-Z][A-Z])/ $1 /g; print "found GX!\n" if $martian =~ /GX/; Or like this: my @chars = $martian =~ m/([A-Z][A-Z]|[^A-Z])/g; # above is conceptually similar to: my @chars = $text =~ m/(.)/g; # foreach my $char (@chars) { print "found GX!\n", last if $char eq 'GX'; } Or like this: while ($martian =~ m/\G([A-Z][A-Z]|.)/gs) { # \G probably unneeded if ($1 eq 'GX') { print "found GX!\n"; last; } } Here's another, slightly less painful, way to do it from Benjamin Goldberg, who uses a zero-width negative look-behind assertion. print "found GX!\n" if $martian =~ m/ (?<![A-Z]) (?:[A-Z][A-Z])*? GX /x; This succeeds if the "martian" character GX is in the string, and fails otherwise. If you don't like using (?<!), a zero-width negative look-behind assertion, you can replace (?<![A-Z]) with (?:^|[^A-Z]). It does have the drawback of putting the wrong thing in $-[0] and $+[0], but this usually can be worked around. Found in /usr/share/perl/5.34/pod/perlfaq8.pod How do I trap control characters/signals? You don't actually "trap" a control character. Instead, that character generates a signal which is sent to your terminal's currently foregrounded process group, which you then trap in your process. Signals are documented in "Signals" in perlipc and the section on "Signals" in the Camel. You can set the values of the %SIG hash to be the functions you want to handle the signal. After perl catches the signal, it looks in %SIG for a key with the same name as the signal, then calls the subroutine value for that key. # as an anonymous subroutine $SIG{INT} = sub { syswrite(STDERR, "ouch\n", 5 ) }; # or a reference to a function $SIG{INT} = \&ouch; # or the name of the function as a string $SIG{INT} = "ouch"; Perl versions before 5.8 had in its C source code signal handlers which would catch the signal and possibly run a Perl function that you had set in %SIG. This violated the rules of signal handling at that level causing perl to dump core. Since version 5.8.0, perl looks at %SIG after the signal has been caught, rather than while it is being caught. Previous versions of this answer were incorrect.
Generated by phpMan v3.7.7 Author: Che Dong Under GNU General Public License
2026-06-10 09:14 @216.73.217.62
CrawledBy Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)