{
    "mode": "perldoc",
    "parameter": "String::Approx",
    "section": "",
    "url": "https://www.chedong.com/phpMan.php/perldoc/String%3A%3AApprox/json",
    "generated": "2026-06-10T19:34:41Z",
    "synopsis": "use String::Approx 'amatch';\nprint if amatch(\"foobar\");\nmy @matches = amatch(\"xyzzy\", @inputs);\nmy @catches = amatch(\"plugh\", ['2'], @inputs);",
    "sections": {
        "NAME": {
            "content": "String::Approx - Perl extension for approximate matching (fuzzy matching)\n",
            "subsections": []
        },
        "SYNOPSIS": {
            "content": "use String::Approx 'amatch';\n\nprint if amatch(\"foobar\");\n\nmy @matches = amatch(\"xyzzy\", @inputs);\n\nmy @catches = amatch(\"plugh\", ['2'], @inputs);\n",
            "subsections": []
        },
        "DESCRIPTION": {
            "content": "String::Approx lets you match and substitute strings approximately. With this you can emulate\nerrors: typing errors, spelling errors, closely related vocabularies (colour color), genetic\nmutations (GAG ACT), abbreviations (McScot, MacScot).\n\nNOTE: String::Approx suits the task of string matching, not string comparison, and it works for\nstrings, not for text.\n\nIf you want to compare strings for similarity, you probably just want the Levenshtein edit\ndistance (explained below), the Text::Levenshtein and Text::LevenshteinXS modules in CPAN. See\nalso Text::WagnerFischer and Text::PhraseDistance. (There are functions for this in\nString::Approx, e.g. adist(), but their results sometimes differ from the bare Levenshtein et\nal.)\n\nIf you want to compare things like text or source code, consisting of words or tokens and\nphrases and sentences, or expressions and statements, you should probably use some other tool\nthan String::Approx, like for example the standard UNIX diff(1) tool, or the Algorithm::Diff\nmodule from CPAN.\n\nThe measure of approximateness is the *Levenshtein edit distance*. It is the total number of\n\"edits\": insertions,\n\nword world\n\ndeletions,\n\nmonkey money\n\nand substitutions\n\nsun fun\n\nrequired to transform a string to another string. For example, to transform *\"lead\"* into\n*\"gold\"*, you need three edits:\n\nlead gead goad gold\n\nThe edit distance of \"lead\" and \"gold\" is therefore three, or 75%.\n\nString::Approx uses the Levenshtein edit distance as its measure, but String::Approx is not\nwell-suited for comparing strings of different length, in other words, if you want a \"fuzzy eq\",\nsee above. String::Approx is more like regular expressions or index(), it finds substrings that\nare close matches.>\n",
            "subsections": []
        },
        "MATCH": {
            "content": "use String::Approx 'amatch';\n\n$matched     = amatch(\"pattern\")\n$matched     = amatch(\"pattern\", [ modifiers ])\n\n$anymatched = amatch(\"pattern\", @inputs)\n$anymatched = amatch(\"pattern\", [ modifiers ], @inputs)\n\n@match       = amatch(\"pattern\")\n@match       = amatch(\"pattern\", [ modifiers ])\n\n@matches     = amatch(\"pattern\", @inputs)\n@matches     = amatch(\"pattern\", [ modifiers ], @inputs)\n\nMatch pattern approximately. In list context return the matched @inputs. If no inputs are given,\nmatch against the $. In scalar context return true if *any* of the inputs match, false if none\nmatch.\n\nNotice that the pattern is a string. Not a regular expression. None of the regular expression\nnotations (^, ., *, and so on) work. They are characters just like the others. Note-on-note:\nsome limited form of *\"regular expressionism\"* is planned in future: for example character\nclasses ([abc]) and *any-chars* (.). But that feature will be turned on by a special *modifier*\n(just a guess: \"r\"), so there should be no backward compatibility problem.\n\nNotice also that matching is not symmetric. The inputs are matched against the pattern, not the\nother way round. In other words: the pattern can be a substring, a submatch, of an input\nelement. An input element is always a superstring of the pattern.\n\nMODIFIERS\nWith the modifiers you can control the amount of approximateness and certain other control\nvariables. The modifiers are one or more strings, for example \"i\", within a string optionally\nseparated by whitespace. The modifiers are inside an anonymous array: the [ ] in the syntax are\nnot notational, they really do mean [ ], for example [ \"i\", \"2\" ]. [\"2 i\"] would be identical.\n\nThe implicit default approximateness is 10%, rounded up. In other words: every tenth character\nin the pattern may be an error, an edit. You can explicitly set the maximum approximateness by\nsupplying a modifier like\n\nnumber\nnumber%\n\nExamples: \"3\", \"15%\".\n\nNote that \"0%\" is not rounded up, it is equal to 0.\n\nUsing a similar syntax you can separately control the maximum number of insertions, deletions,\nand substitutions by prefixing the numbers with I, D, or S, like this:\n\nInumber\nInumber%\nDnumber\nDnumber%\nSnumber\nSnumber%\n\nExamples: \"I2\", \"D20%\", \"S0\".\n\nYou can ignore case (\"A\" becames equal to \"a\" and vice versa) by adding the \"i\" modifier.\n\nFor example\n\n[ \"i 25%\", \"S0\" ]\n\nmeans *ignore case*, *allow every fourth character to be \"an edit\"*, but allow *no\nsubstitutions*. (See NOTES about disallowing substitutions or insertions.)\n\nNOTE: setting \"I0 D0 S0\" is not equivalent to using index(). If you want to use index(), use",
            "subsections": [
                {
                    "name": "index",
                    "content": ""
                }
            ]
        },
        "SUBSTITUTE": {
            "content": "use String::Approx 'asubstitute';\n\n@substituted = asubstitute(\"pattern\", \"replacement\")\n@substituted = asubstitute(\"pattern\", \"replacement\", @inputs)\n@substituted = asubstitute(\"pattern\", \"replacement\", [ modifiers ])\n@substituted = asubstitute(\"pattern\", \"replacement\",\n[ modifiers ], @inputs)\n\nSubstitute approximate pattern with replacement and return as a list <copies> of @inputs, the\nsubstitutions having been made on the elements that did match the pattern. If no inputs are\ngiven, substitute in the $. The replacement can contain magic strings $&, $`, $' that stand for\nthe matched string, the string before it, and the string after it, respectively. All the other\narguments are as in \"amatch()\", plus one additional modifier, \"g\" which means substitute\nglobally (all the matches in an element and not just the first one, as is the default).\n\nSee \"BAD NEWS\" about the unfortunate stinginess of \"asubstitute()\".\n",
            "subsections": []
        },
        "INDEX": {
            "content": "use String::Approx 'aindex';\n\n$index   = aindex(\"pattern\")\n@indices = aindex(\"pattern\", @inputs)\n$index   = aindex(\"pattern\", [ modifiers ])\n@indices = aindex(\"pattern\", [ modifiers ], @inputs)\n\nLike \"amatch()\" but returns the index/indices at which the pattern matches approximately. In\nlist context and if @inputs are used, returns a list of indices, one index for each input\nelement. If there's no approximate match, -1 is returned as the index.\n\nNOTE: if there is character repetition (e.g. \"aa\") either in the pattern or in the text, the\nreturned index might start \"too early\". This is consistent with the goal of the module of\nmatching \"as early as possible\", just like regular expressions (that there might be a \"less\napproximate\" match starting later is of somewhat irrelevant).\n\nThere's also backwards-scanning \"arindex()\".\n",
            "subsections": []
        },
        "SLICE": {
            "content": "use String::Approx 'aslice';\n\n($index, $size)   = aslice(\"pattern\")\n([$i0, $s0], ...) = aslice(\"pattern\", @inputs)\n($index, $size)   = aslice(\"pattern\", [ modifiers ])\n([$i0, $s0], ...) = aslice(\"pattern\", [ modifiers ], @inputs)\n\nLike \"aindex()\" but returns also the size (length) of the match. If the match fails, returns an\nempty list (when matching against $) or an empty anonymous list corresponding to the particular\ninput.\n\nNOTE: size of the match will very probably be something you did not expect (such as longer than\nthe pattern, or a negative number). This may or may not be fixed in future releases. Also the\nbeginning of the match may vary from the expected as with aindex(), see above.\n\nIf the modifier\n\n\"minimaldistance\"\n\nis used, the minimal possible edit distance is returned as the third element:\n\n($index, $size, $distance) = aslice(\"pattern\", [ modifiers ])\n([$i0, $s0, $d0], ...)     = aslice(\"pattern\", [ modifiers ], @inputs)\n",
            "subsections": []
        },
        "DISTANCE": {
            "content": "use String::Approx 'adist';\n\n$dist = adist(\"pattern\", $input);\n@dist = adist(\"pattern\", @input);\n\nReturn the *edit distance* or distances between the pattern and the input or inputs. Zero edit\ndistance means exact match. (Remember that the match can 'float' in the inputs, the match is a\nsubstring match.) If the pattern is longer than the input or inputs, the returned distance or\ndistances is or are negative.\n\nuse String::Approx 'adistr';\n\n$dist = adistr(\"pattern\", $input);\n@dist = adistr(\"pattern\", @inputs);\n\nReturn the relative *edit distance* or distances between the pattern and the input or inputs.\nZero relative edit distance means exact match, one means completely different. (Remember that\nthe match can 'float' in the inputs, the match is a substring match.) If the pattern is longer\nthan the input or inputs, the returned distance or distances is or are negative.\n\nYou can use adist() or adistr() to sort the inputs according to their approximateness:\n\nmy %d;\n@d{@inputs} = map { abs } adistr(\"pattern\", @inputs);\nmy @d = sort { $d{$a} <=> $d{$b} } @inputs;\n\nNow @d contains the inputs, the most like \"pattern\" first.\n",
            "subsections": []
        },
        "CONTROLLING THE CACHE": {
            "content": "\"String::Approx\" maintains a LU (least-used) cache that holds the 'matching engines' for each\ninstance of a *pattern+modifiers*. The cache is intended to help the case where you match a\nsmall set of patterns against a large set of string. However, the more engines you cache the\nmore you eat memory. If you have a lot of different patterns or if you have a lot of memory to\nburn, you may want to control the cache yourself. For example, allowing a larger cache consumes\nmore memory but probably runs a little bit faster since the cache fills (and needs flushing)\nless often.\n\nThe cache has two parameters: *max* and *purge*. The first one is the maximum size of the cache\nand the second one is the cache flushing ratio: when the number of cache entries exceeds *max*,\n*max* times *purge* cache entries are flushed. The default values are 1000 and 0.75,\nrespectively, which means that when the 1001st entry would be cached, 750 least used entries\nwill be removed from the cache. To access the parameters you can use the calls\n\n$nowmax = String::Approx::cachemax();\nString::Approx::cachemax($newmax);\n\n$nowpurge = String::Approx::cachepurge();\nString::Approx::cachepurge($newpurge);\n\n$limit = String::Approx::cachenpurge();\n\nTo be honest, there are actually two caches: the first one is used far the patterns with no\nmodifiers, the second one for the patterns with pattern modifiers. Using the standard parameters\nyou will therefore actually cache up to 2000 entries. The above calls control both caches for\nthe same price.\n\nTo disable caching completely use\n\nString::Approx::cachedisable();\n\nNote that this doesn't flush any possibly existing cache entries, to do that use\n\nString::Approx::cacheflushall();\n",
            "subsections": []
        },
        "NOTES": {
            "content": "Because matching is by *substrings*, not by whole strings, insertions and substitutions produce\noften very similar results: \"abcde\" matches \"axbcde\" either by insertion or substitution of \"x\".\n\nThe maximum edit distance is also the maximum number of edits. That is, the \"I2\" in\n\namatch(\"abcd\", [\"I2\"])\n\nis useless because the maximum edit distance is (implicitly) 1. You may have meant to say\n\namatch(\"abcd\", [\"2D1S1\"])\n\nor something like that.\n\nIf you want to simulate transposes\n\nfeet fete\n\nyou need to allow at least edit distance of two because in terms of our edit primitives a\ntranspose is first one deletion and then one insertion.\n\nTEXT POSITION\nThe starting and ending positions of matching, substituting, indexing, or slicing can be changed\nfrom the beginning and end of the input(s) to some other positions by using either or both of\nthe modifiers\n\n\"initialposition=24\"\n\"finalposition=42\"\n\nor the both the modifiers\n\n\"initialposition=24\"\n\"positionrange=10\"\n\nBy setting the \"positionrange\" to be zero you can limit (anchor) the operation to happen only\nonce (if a match is possible) at the position.\n",
            "subsections": []
        },
        "VERSION": {
            "content": "Major release 3.\n",
            "subsections": []
        },
        "CHANGES FROM VERSION 2": {
            "content": "GOOD NEWS\nThe version 3 is 2-3 times faster than version 2\nNo pattern length limitation\nThe algorithm is independent on the pattern length: its time complexity is *O(kn)*, where\n*k* is the number of edits and *n* the length of the text (input). The preprocessing of the\npattern will of course take some *O(m)* (*m* being the pattern length) time, but \"amatch()\"\nand \"asubstitute()\" cache the result of this preprocessing so that it is done only once per\npattern.\n\nBAD NEWS\nYou do need a C compiler to install the module\nPerl's regular expressions are no more used; instead a faster and more scalable algorithm\nwritten in C is used.\n\n\"asubstitute()\" is now always stingy\nThe string matched and substituted is now always stingy, as short as possible. It used to be\nas long as possible. This is an unfortunate change stemming from switching the matching\nalgorithm. Example: with edit distance of two and substituting for \"word\" from \"cork\" and\n\"wool\" previously did match \"cork\" and \"wool\". Now it does match \"or\" and \"wo\". As little as\npossible, or, in other words, with as much approximateness, as many edits, as possible.\nBecause there is no *need* to match the \"c\" of \"cork\", it is not matched.\n\nno more \"aregex()\" because regular expressions are no more used\nno more \"compat1\" for String::Approx version 1 compatibility\n",
            "subsections": []
        },
        "ACKNOWLEDGEMENTS": {
            "content": "The following people have provided valuable test cases, documentation clarifications, and other\nfeedback:\n\nJared August, Arthur Bergman, Anirvan Chatterjee, Steve A. Chervitz, Aldo Calpini, David Curiel,\nTeun van den Dool, Alberto Fontaneda, Rob Fugina, Dmitrij Frishman, Lars Gregersen, Kevin\nGreiner, B. Elijah Griffin, Mike Hanafey, Mitch Helle, Ricky Houghton, 'idallen', Helmut\nJarausch, Damian Keefe, Ben Kennedy, Craig Kelley, Franz Kirsch, Dag Kristian, Mark Land, J. D.\nLaub, John P. Linderman, Tim Maher, Juha Muilu, Sergey Novoselov, Andy Oram, Ji Y Park, Eric\nPromislow, Nikolaus Rath, Stefan Ram, Slaven Rezic, Dag Kristian Rognlien, Stewart Russell,\nSlaven Rezic, Chris Rosin, Pasha Sadri, Ilya Sandler, Bob J.A. Schijvenaars, Ross Smith, Frank\nTobin, Greg Ward, Rich Williams, Rick Wise.\n\nThe matching algorithm was developed by Udi Manber, Sun Wu, and Burra Gopal in the Department of\nComputer Science, University of Arizona.\n",
            "subsections": []
        },
        "AUTHOR": {
            "content": "Jarkko Hietaniemi <jhi@iki.fi>\n",
            "subsections": []
        },
        "COPYRIGHT AND LICENSE": {
            "content": "Copyright 2001-2013 by Jarkko Hietaniemi\n\nThis library is free software; you can redistribute it and/or modify under either the terms of\nthe Artistic License 2.0, or the GNU Library General Public License, Version 2. See the files\nArtistic and LGPL for more details.\n\nFurthermore: no warranties or obligations of any kind are given, and the separate file COPYRIGHT\nmust be included intact in all copies and derived materials.\n",
            "subsections": []
        }
    },
    "summary": "String::Approx - Perl extension for approximate matching (fuzzy matching)",
    "flags": [],
    "examples": [],
    "see_also": []
}