Lingua::Stem::En - phpMan

Command: man perldoc info search(apropos)  


Sections
NAME SYNOPSIS DESCRIPTION CHANGES METHODS NOTES SEE ALSO AUTHOR COPYRIGHT BUGS TODO
NAME
    Lingua::Stem::En - Porter's stemming algorithm for 'generic' English

SYNOPSIS
        use Lingua::Stem::En;
        my $stems   = Lingua::Stem::En::stem({ -words => $word_list_reference,
                                            -locale => 'en',
                                        -exceptions => $exceptions_hash,
                                         });

DESCRIPTION
    This routine applies the Porter Stemming Algorithm to its parameters,
    returning the stemmed words.

    It is derived from the C program "stemmer.c" as found in freewais and
    elsewhere, which contains these notes:

       Purpose:    Implementation of the Porter stemming algorithm documented
                   in: Porter, M.F., "An Algorithm For Suffix Stripping,"
                   Program 14 (3), July 1980, pp. 130-137.
       Provenance: Written by B. Frakes and C. Cox, 1986.

    I have re-interpreted areas that use Frakes and Cox's "WordSize"
    function. My version may misbehave on short words starting with "y", but
    I can't think of any examples.

    The step numbers correspond to Frakes and Cox, and are probably in
    Porter's article (which I've not seen). Porter's algorithm still has
    rough spots (e.g current/currency, -ings words), which I've not
    attempted to cure, although I have added support for the British -ise
    suffix.

CHANGES
     1999.06.15 - Changed to '.pm' module, moved into Lingua::Stem namespace,
                  optionalized the export of the 'stem' routine
                  into the caller's namespace, added named parameters

     1999.06.24 - Switch core implementation of the Porter stemmer to
                  the one written by Jim Richardson <jimr AT maths.au>

     2000.08.25 - 2.11 Added stemming cache

     2000.09.14 - 2.12 Fixed *major* :( implementation error of Porter's algorithm
                  Error was entirely my fault - I completely forgot to include
                  rule sets 2,3, and 4 starting with Lingua::Stem 0.30.
                  -- Jerilyn Franz

     2003.09.28 - 2.13 Corrected documentation error pointed out by Simon Cozens.

     2005.11.20 - 2.14 Changed rule declarations to conform to Perl style convention
                  for 'private' subroutines. Changed Exporter invokation to more
                  portable 'require' vice 'use'.

     2006.02.14 - 2.15 Added ability to pass word list by 'handle' for in-place stemming.

     2009.07.27 - 2.16 Documentation Fix

     2020.06.20 - 2.30 Version renumber for module consistency.

METHODS
    stem({ -words => \@words, -locale => 'en', -exceptions => \%exceptions
    });
        Stems a list of passed words using the rules of US English. Returns
        an anonymous array reference to the stemmed words.

        Example:

          my @words         = ( 'wordy', 'another' );
          my $stemmed_words = Lingua::Stem::En::stem({ -words => \@words,
                                                      -locale => 'en',
                                                  -exceptions => \%exceptions,
                                  });

        If the first element of @words is a list reference, then the
        stemming is performed 'in place' on that list (modifying the passed
        list directly instead of copying it to a new array).

        This is only useful if you do not need to keep the original list. If
        you do need to keep the original list, use the normal semantic of
        having 'stem' return a new list instead - that is faster than making
        your own copy and using the 'in place' semantics since the primary
        difference between 'in place' and 'by value' stemming is the
        creation of a copy of the original list. If you don't need the
        original list, then the 'in place' stemming is about 60% faster.

        Example of 'in place' stemming:

          my $words         = [ 'wordy', 'another' ];
          my $stemmed_words = Lingua::Stem::En::stem({ -words => [$words],
                                  -locale => 'en',
                              -exceptions => \%exceptions,
                              });

        The 'in place' mode returns a reference to the original list with
        the words stemmed.

    stem_caching({ -level => 0|1|2 });
        Sets the level of stem caching.

        '0' means 'no caching'. This is the default level.

        '1' means 'cache per run'. This caches stemming results during a
        single call to 'stem'.

        '2' means 'cache indefinitely'. This caches stemming results until
        either the process exits or the 'clear_stem_cache' method is called.

    clear_stem_cache;
        Clears the cache of stemmed words

NOTES
    This code is almost entirely derived from the Porter 2.1 module written
    by Jim Richardson.

SEE ALSO
     Lingua::Stem

AUTHOR
      Jim Richardson, University of Sydney
      jimr AT maths.au or http://www.maths.usyd.edu.au:8000/jimr.html

      Integration in Lingua::Stem by
      Jerilyn Franz, FreeRun Technologies,
      <cpan AT jerilyn.info>

COPYRIGHT
    Jim Richardson, University of Sydney Jerilyn Franz, FreeRun Technologies

    This code is freely available under the same terms as Perl.

BUGS
TODO

Generated by phpMan Author: Che Dong On Apache Under GNU General Public License - MarkDown Format
2026-05-23 06:50 @216.73.217.24 CrawledBy Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
Valid XHTML 1.0 TransitionalValid CSS!

^_back to top