phpman > perldoc > Lingua::Stem::EnBroken

Che Dong
NAME
    Lingua::Stem::EnBroken - Porter's stemming algorithm for 'generic' English

SYNOPSIS
        use Lingua::Stem::EnBroken;
        my $stems   = Lingua::Stem::EnBroken::stem({ -words => $word_list_reference,
                                            -locale => 'en',
                                        -exceptions => $exceptions_hash,
                                         });

DESCRIPTION
    This routine MIS-applies the Porter Stemming Algorithm to its parameters, returning the stemmed
    words. It is an intentionally broken version of Lingua::Stem::En for people needing backwards
    compatibility with Lingua::Stem 0.30 and Lingua::Stem 0.40. Do not use it if you aren't one of
    those people.

    It is derived from the C program "stemmer.c" as found in freewais and elsewhere, which contains
    these notes:

       Purpose:    Implementation of the Porter stemming algorithm documented
                   in: Porter, M.F., "An Algorithm For Suffix Stripping,"
                   Program 14 (3), July 1980, pp. 130-137.
       Provenance: Written by B. Frakes and C. Cox, 1986.

    I have re-interpreted areas that use Frakes and Cox's "WordSize" function. My version may
    misbehave on short words starting with "y", but I can't think of any examples.

    The step numbers correspond to Frakes and Cox, and are probably in Porter's article (which I've
    not seen). Porter's algorithm still has rough spots (e.g current/currency, -ings words), which
    I've not attempted to cure, although I have added support for the British -ise suffix.

CHANGES
     2003.09.28 -  Documentation fix

     2000.09.14 -  Forked from the Lingua::Stem::En.pm module to provide
                   a backward compatibly broken version for people needing
                   consistent behavior with 0.30 and 0.40 more than accurate
                   stemming.

METHODS
    stem({ -words => \@words, -locale => 'en', -exceptions => \%exceptions });
        Stems a list of passed words using the rules of US English. Returns an anonymous array
        reference to the stemmed words.

        Example:

          my $stemmed_words = Lingua::Stem::EnBroken::stem({ -words => \@words,
                                                      -locale => 'en',
                                                  -exceptions => \%exceptions,
                                  });

    stem_caching({ -level => 0|1|2 });
        Sets the level of stem caching.

        '0' means 'no caching'. This is the default level.

        '1' means 'cache per run'. This caches stemming results during a single call to 'stem'.

        '2' means 'cache indefinitely'. This caches stemming results until either the process exits
        or the 'clear_stem_cache' method is called.

    clear_stem_cache;
        Clears the cache of stemmed words

NOTES
    This code is almost entirely derived from the Porter 2.1 module written by Jim Richardson.

SEE ALSO
     Lingua::Stem

AUTHOR
      Jim Richardson, University of Sydney
      jimr AT maths.au or http://www.maths.usyd.edu.au:8000/jimr.html

      Integration in Lingua::Stem by
      Jerilyn Franz, FreeRun Technologies,
      <cpan AT jerilyn.info>

COPYRIGHT
    Jim Richardson, University of Sydney Jerilyn Franz, FreeRun Technologies

    This code is freely available under the same terms as Perl.

BUGS
TODO
Generated by phpman v3.7.12 Author: Che Dong Under GNU General Public License
2026-06-13 14:32 @216.73.216.28
CrawledBy Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)