{
    "content": [
        {
            "type": "text",
            "text": "# Lingua::EN::Sentence (perldoc)\n\n## NAME\n\nLingua::EN::Sentence - split text into sentences\n\n## SYNOPSIS\n\nuse Lingua::EN::Sentence qw( getsentences addacronyms );\naddacronyms('lt','gen');               ## adding support for 'Lt. Gen.'\nmy $sentences=getsentences($text);     ## Get the sentences.\nforeach my $sentence (@$sentences) {\n## do something with $sentence\n}\n\n## DESCRIPTION\n\nThe \"Lingua::EN::Sentence\" module contains the function getsentences, which splits text into\nits constituent sentences, based on a regular expression and a list of abbreviations (built in\nand given).\n\n## Sections\n\n- **NAME**\n- **SYNOPSIS**\n- **DESCRIPTION**\n- **ALGORITHM**\n- **FUNCTIONS** (7 subsections)\n- **Acronym/Abbreviations list**\n- **FUTURE WORK**\n- **SEE ALSO**\n- **REPOSITORY**\n- **AUTHOR**\n- **COPYRIGHT AND LICENSE**\n\nUse structuredContent.sections for detailed options, examples, and full documentation.\n"
        }
    ],
    "structuredContent": {
        "command": "Lingua::EN::Sentence",
        "section": "",
        "mode": "perldoc",
        "summary": "Lingua::EN::Sentence - split text into sentences",
        "synopsis": "use Lingua::EN::Sentence qw( getsentences addacronyms );\naddacronyms('lt','gen');               ## adding support for 'Lt. Gen.'\nmy $sentences=getsentences($text);     ## Get the sentences.\nforeach my $sentence (@$sentences) {\n## do something with $sentence\n}",
        "tldr_summary": null,
        "tldr_examples": [],
        "tldr_source": null,
        "flags": [],
        "examples": [],
        "see_also": [],
        "section_outline": [
            {
                "name": "NAME",
                "lines": 2,
                "subsections": []
            },
            {
                "name": "SYNOPSIS",
                "lines": 8,
                "subsections": []
            },
            {
                "name": "DESCRIPTION",
                "lines": 9,
                "subsections": []
            },
            {
                "name": "ALGORITHM",
                "lines": 10,
                "subsections": []
            },
            {
                "name": "FUNCTIONS",
                "lines": 2,
                "subsections": [
                    {
                        "name": "get_sentences",
                        "lines": 5
                    },
                    {
                        "name": "add_acronyms",
                        "lines": 10
                    },
                    {
                        "name": "get_acronyms",
                        "lines": 2
                    },
                    {
                        "name": "set_acronyms",
                        "lines": 3
                    },
                    {
                        "name": "get_EOS",
                        "lines": 4
                    },
                    {
                        "name": "set_EOS",
                        "lines": 2
                    },
                    {
                        "name": "set_locale",
                        "lines": 14
                    }
                ]
            },
            {
                "name": "Acronym/Abbreviations list",
                "lines": 6,
                "subsections": []
            },
            {
                "name": "FUTURE WORK",
                "lines": 5,
                "subsections": []
            },
            {
                "name": "SEE ALSO",
                "lines": 2,
                "subsections": []
            },
            {
                "name": "REPOSITORY",
                "lines": 2,
                "subsections": []
            },
            {
                "name": "AUTHOR",
                "lines": 4,
                "subsections": []
            },
            {
                "name": "COPYRIGHT AND LICENSE",
                "lines": 6,
                "subsections": []
            }
        ],
        "sections": {
            "NAME": {
                "content": "Lingua::EN::Sentence - split text into sentences\n",
                "subsections": []
            },
            "SYNOPSIS": {
                "content": "use Lingua::EN::Sentence qw( getsentences addacronyms );\n\naddacronyms('lt','gen');               ## adding support for 'Lt. Gen.'\nmy $sentences=getsentences($text);     ## Get the sentences.\nforeach my $sentence (@$sentences) {\n## do something with $sentence\n}\n",
                "subsections": []
            },
            "DESCRIPTION": {
                "content": "The \"Lingua::EN::Sentence\" module contains the function getsentences, which splits text into\nits constituent sentences, based on a regular expression and a list of abbreviations (built in\nand given).\n\nCertain well know exceptions, such as abbreviations, may cause incorrect segmentations. But some\nof them are already integrated into this code and are being taken care of. Still, if you see\nthat there are words causing the getsentences function to fail, you can add those to the\nmodule, so it notices them.\n",
                "subsections": []
            },
            "ALGORITHM": {
                "content": "Basically, I use a 'brute' regular expression to split the text into sentences. (Well, nothing\nis yet split - I just mark the end-of-sentence). Then I look into a set of rules which decide\nwhen an end-of-sentence is justified and when it's a mistake. In case of a mistake, the\nend-of-sentence mark is removed.\n\nWhat are such mistakes? Cases of abbreviations, for example. I have a list of such abbreviations\n(Please see public globals belwo for a list), and more general rules (for example, the\nabbreviations 'i.e.' and '.e.g.' need not to be in the list as a special rule takes care of all\nsingle letter abbreviations).\n",
                "subsections": []
            },
            "FUNCTIONS": {
                "content": "All functions used should be requested in the 'use' clause. None is exported by default.\n",
                "subsections": [
                    {
                        "name": "get_sentences",
                        "content": "The getsentences function takes a scalar containing ascii text as an argument and returns a\nreference to an array of sentences that the text has been split into. Returned sentences\nwill be trimmed (beginning and end of sentence) of white space. Strings with no\nalpha-numeric characters in them, won't be returned as sentences.\n"
                    },
                    {
                        "name": "add_acronyms",
                        "content": "This function is used for adding acronyms not supported by this code. The input should be\nregular expressions for matching the desired acronyms, but should not include the final\nperiod (\".\"). So, for example, \"blv?d\" matches \"blvd.\" and \"bld.\". \"a\\.mlf\" will match\n\"a.mlf.\". You do not need to bother with acronyms consisting of single letters and dots\n(e.g. \"U.S.A.\"), as these are found automatically. Note also that acronyms are searched for\non a case insensitive basis.\n\nPlease see 'Acronym/Abbreviations list' section for the abbreviations already supported by\nthis module.\n"
                    },
                    {
                        "name": "get_acronyms",
                        "content": "This function will return the defined list of acronyms.\n"
                    },
                    {
                        "name": "set_acronyms",
                        "content": "This function replaces the predefined acronym list with the given list. See \"addacronyms\"\nfor details on the input specifications.\n"
                    },
                    {
                        "name": "get_EOS",
                        "content": "This function returns the value of the string used to mark the end of sentence. You might\nwant to see what it is, and to make sure your text doesn't contain it. You can use setEOS()\nto alter the end-of-sentence string to whatever you desire.\n"
                    },
                    {
                        "name": "set_EOS",
                        "content": "This function alters the end-of-sentence string used to mark the end of sentences.\n"
                    },
                    {
                        "name": "set_locale",
                        "content": "for example: \"frCA.ISO8859-1\" for Canadian French using character set ISO8859-1.\nReturns a reference to a hash containing the current locale formatting values. Returns undef\nif got undef.\n\nThe following will set the LCCOLLATE behaviour to Argentinian Spanish. NOTE: The naming and\navailability of locales depends on your operating sysem. Please consult the perllocale\nmanpage for how to find out which locales are available in your system.\n\n$loc = setlocale( \"esAR.ISO8859-1\" );\n\nThis actually does this:\n\n$loc = setlocale( LCALL, \"esAR.ISO8859-1\" );\n"
                    }
                ]
            },
            "Acronym/Abbreviations list": {
                "content": "You can use the getacronyms() function to get acronyms. It has become too long to specify in\nthe documentation.\n\nIf I come across a good general-purpose list - I'll incorporate it into this module. Feel free\nto suggest such lists.\n",
                "subsections": []
            },
            "FUTURE WORK": {
                "content": "[1] Object Oriented like usage\n[2] Supporting more than just English/French\n[3] Code optimization. Currently everything is RE based and not so optimized RE\n[4] Possibly use more semantic heuristics for detecting a beginning of a sentence\n",
                "subsections": []
            },
            "SEE ALSO": {
                "content": "Text::Sentence\n",
                "subsections": []
            },
            "REPOSITORY": {
                "content": "<https://github.com/kimryan/Lingua-EN-Sentence>\n",
                "subsections": []
            },
            "AUTHOR": {
                "content": "Shlomo Yona shlomo@cs.haifa.ac.il\n\nCurrently being maintained by Kim Ryan, kimryan at CPAN d o t org\n",
                "subsections": []
            },
            "COPYRIGHT AND LICENSE": {
                "content": "Copyright (c) 2001-2016 Shlomo Yona. All rights reserved. Copyright (c) 2018 Kim Ryan. All\nrights reserved.\n\nThis library is free software; you can redistribute it and/or modify it under the same terms as\nPerl itself.\n",
                "subsections": []
            }
        }
    }
}