{
    "content": [
        {
            "type": "text",
            "text": "# Unicode::Normalize (perldoc)\n\n**Summary:** Unicode::Normalize - Unicode Normalization Forms\n\n**Synopsis:** (1) using function names exported by default:\nuse Unicode::Normalize;\n$NFDstring  = NFD($string);  # Normalization Form D\n$NFCstring  = NFC($string);  # Normalization Form C\n$NFKDstring = NFKD($string); # Normalization Form KD\n$NFKCstring = NFKC($string); # Normalization Form KC\n(2) using function names exported on request:\nuse Unicode::Normalize 'normalize';\n$NFDstring  = normalize('D',  $string);  # Normalization Form D\n$NFCstring  = normalize('C',  $string);  # Normalization Form C\n$NFKDstring = normalize('KD', $string);  # Normalization Form KD\n$NFKCstring = normalize('KC', $string);  # Normalization Form KC\n\n## Section Outline\n\n- **NAME** (2 lines)\n- **SYNOPSIS** (18 lines)\n- **DESCRIPTION** (10 lines) — 4 subsections\n  - Normalization Forms (39 lines)\n  - Decomposition and Composition (98 lines)\n  - Quick Check (70 lines)\n  - Character Data (62 lines)\n- **EXPORT** (4 lines)\n- **CAVEATS** (41 lines)\n- **AUTHOR** (6 lines)\n- **LICENSE** (3 lines)\n- **SEE ALSO** (18 lines)\n\n## Full Content\n\n### NAME\n\nUnicode::Normalize - Unicode Normalization Forms\n\n### SYNOPSIS\n\n(1) using function names exported by default:\n\nuse Unicode::Normalize;\n\n$NFDstring  = NFD($string);  # Normalization Form D\n$NFCstring  = NFC($string);  # Normalization Form C\n$NFKDstring = NFKD($string); # Normalization Form KD\n$NFKCstring = NFKC($string); # Normalization Form KC\n\n(2) using function names exported on request:\n\nuse Unicode::Normalize 'normalize';\n\n$NFDstring  = normalize('D',  $string);  # Normalization Form D\n$NFCstring  = normalize('C',  $string);  # Normalization Form C\n$NFKDstring = normalize('KD', $string);  # Normalization Form KD\n$NFKCstring = normalize('KC', $string);  # Normalization Form KC\n\n### DESCRIPTION\n\nParameters:\n\n$string is used as a string under character semantics (see perlunicode).\n\n$codepoint should be an unsigned integer representing a Unicode code point.\n\nNote: Between XSUB and pure Perl, there is an incompatibility about the interpretation of\n$codepoint as a decimal number. XSUB converts $codepoint to an unsigned integer, but pure Perl\ndoes not. Do not use a floating point nor a negative sign in $codepoint.\n\n#### Normalization Forms\n\n\"$NFDstring = NFD($string)\"\nIt returns the Normalization Form D (formed by canonical decomposition).\n\n\"$NFCstring = NFC($string)\"\nIt returns the Normalization Form C (formed by canonical decomposition followed by canonical\ncomposition).\n\n\"$NFKDstring = NFKD($string)\"\nIt returns the Normalization Form KD (formed by compatibility decomposition).\n\n\"$NFKCstring = NFKC($string)\"\nIt returns the Normalization Form KC (formed by compatibility decomposition followed by\ncanonical composition).\n\n\"$FCDstring = FCD($string)\"\nIf the given string is in FCD (\"Fast C or D\" form; cf. UTN #5), it returns the string\nwithout modification; otherwise it returns an FCD string.\n\nNote: FCD is not always unique, then plural forms may be equivalent each other. \"FCD()\" will\nreturn one of these equivalent forms.\n\n\"$FCCstring = FCC($string)\"\nIt returns the FCC form (\"Fast C Contiguous\"; cf. UTN #5).\n\nNote: FCC is unique, as well as four normalization forms (NF*).\n\n\"$normalizedstring = normalize($formname, $string)\"\nIt returns the normalization form of $formname.\n\nAs $formname, one of the following names must be given.\n\n'C'  or 'NFC'  for Normalization Form C  (UAX #15)\n'D'  or 'NFD'  for Normalization Form D  (UAX #15)\n'KC' or 'NFKC' for Normalization Form KC (UAX #15)\n'KD' or 'NFKD' for Normalization Form KD (UAX #15)\n\n'FCD'          for \"Fast C or D\" Form  (UTN #5)\n'FCC'          for \"Fast C Contiguous\" (UTN #5)\n\n#### Decomposition and Composition\n\n\"$decomposedstring = decompose($string [, $useCompatMapping])\"\nIt returns the concatenation of the decomposition of each character in the string.\n\nIf the second parameter (a boolean) is omitted or false, the decomposition is canonical\ndecomposition; if the second parameter (a boolean) is true, the decomposition is\ncompatibility decomposition.\n\nThe string returned is not always in NFD/NFKD. Reordering may be required.\n\n$NFDstring  = reorder(decompose($string));       # eq. to NFD()\n$NFKDstring = reorder(decompose($string, TRUE)); # eq. to NFKD()\n\n\"$reorderedstring = reorder($string)\"\nIt returns the result of reordering the combining characters according to Canonical Ordering\nBehavior.\n\nFor example, when you have a list of NFD/NFKD strings, you can get the concatenated NFD/NFKD\nstring from them, by saying\n\n$concatNFD  = reorder(join '', @NFDstrings);\n$concatNFKD = reorder(join '', @NFKDstrings);\n\n\"$composedstring = compose($string)\"\nIt returns the result of canonical composition without applying any decomposition.\n\nFor example, when you have a NFD/NFKD string, you can get its NFC/NFKC string, by saying\n\n$NFCstring  = compose($NFDstring);\n$NFKCstring = compose($NFKDstring);\n\n\"($processed, $unprocessed) = splitOnLastStarter($normalized)\"\nIt returns two strings: the first one, $processed, is a part before the last starter, and\nthe second one, $unprocessed is another part after the first part. A starter is a character\nhaving a combining class of zero (see UAX #15).\n\nNote that $processed may be empty (when $normalized contains no starter or starts with the\nlast starter), and then $unprocessed should be equal to the entire $normalized.\n\nWhen you have a $normalized string and an $unnormalized string following it, a simple\nconcatenation is wrong:\n\n$concat = $normalized . normalize($form, $unnormalized); # wrong!\n\nInstead of it, do like this:\n\n($processed, $unprocessed) = splitOnLastStarter($normalized);\n$concat = $processed . normalize($form,$unprocessed.$unnormalized);\n\n\"splitOnLastStarter()\" should be called with a pre-normalized parameter $normalized, that is\nin the same form as $form you want.\n\nIf you have an array of @string that should be concatenated and then normalized, you can do\nlike this:\n\nmy $result = \"\";\nmy $unproc = \"\";\nforeach my $str (@string) {\n$unproc .= $str;\nmy $n = normalize($form, $unproc);\nmy($p, $u) = splitOnLastStarter($n);\n$result .= $p;\n$unproc  = $u;\n}\n$result .= $unproc;\n# instead of normalize($form, join('', @string))\n\n\"$processed = normalizepartial($form, $unprocessed)\"\nA wrapper for the combination of \"normalize()\" and \"splitOnLastStarter()\". Note that\n$unprocessed will be modified as a side-effect.\n\nIf you have an array of @string that should be concatenated and then normalized, you can do\nlike this:\n\nmy $result = \"\";\nmy $unproc = \"\";\nforeach my $str (@string) {\n$unproc .= $str;\n$result .= normalizepartial($form, $unproc);\n}\n$result .= $unproc;\n# instead of normalize($form, join('', @string))\n\n\"$processed = NFDpartial($unprocessed)\"\nIt does like \"normalizepartial('NFD', $unprocessed)\". Note that $unprocessed will be\nmodified as a side-effect.\n\n\"$processed = NFCpartial($unprocessed)\"\nIt does like \"normalizepartial('NFC', $unprocessed)\". Note that $unprocessed will be\nmodified as a side-effect.\n\n\"$processed = NFKDpartial($unprocessed)\"\nIt does like \"normalizepartial('NFKD', $unprocessed)\". Note that $unprocessed will be\nmodified as a side-effect.\n\n\"$processed = NFKCpartial($unprocessed)\"\nIt does like \"normalizepartial('NFKC', $unprocessed)\". Note that $unprocessed will be\nmodified as a side-effect.\n\n#### Quick Check\n\n(see Annex 8, UAX #15; and DerivedNormalizationProps.txt)\n\nThe following functions check whether the string is in that normalization form.\n\nThe result returned will be one of the following:\n\nYES     The string is in that normalization form.\nNO      The string is not in that normalization form.\nMAYBE   Dubious. Maybe yes, maybe no.\n\n\"$result = checkNFD($string)\"\nIt returns true (1) if \"YES\"; false (\"empty string\") if \"NO\".\n\n\"$result = checkNFC($string)\"\nIt returns true (1) if \"YES\"; false (\"empty string\") if \"NO\"; \"undef\" if \"MAYBE\".\n\n\"$result = checkNFKD($string)\"\nIt returns true (1) if \"YES\"; false (\"empty string\") if \"NO\".\n\n\"$result = checkNFKC($string)\"\nIt returns true (1) if \"YES\"; false (\"empty string\") if \"NO\"; \"undef\" if \"MAYBE\".\n\n\"$result = checkFCD($string)\"\nIt returns true (1) if \"YES\"; false (\"empty string\") if \"NO\".\n\n\"$result = checkFCC($string)\"\nIt returns true (1) if \"YES\"; false (\"empty string\") if \"NO\"; \"undef\" if \"MAYBE\".\n\nNote: If a string is not in FCD, it must not be in FCC. So \"checkFCC($notFCDstring)\"\nshould return \"NO\".\n\n\"$result = check($formname, $string)\"\nIt returns true (1) if \"YES\"; false (\"empty string\") if \"NO\"; \"undef\" if \"MAYBE\".\n\nAs $formname, one of the following names must be given.\n\n'C'  or 'NFC'  for Normalization Form C  (UAX #15)\n'D'  or 'NFD'  for Normalization Form D  (UAX #15)\n'KC' or 'NFKC' for Normalization Form KC (UAX #15)\n'KD' or 'NFKD' for Normalization Form KD (UAX #15)\n\n'FCD'          for \"Fast C or D\" Form  (UTN #5)\n'FCC'          for \"Fast C Contiguous\" (UTN #5)\n\nNote\n\nIn the cases of NFD, NFKD, and FCD, the answer must be either \"YES\" or \"NO\". The answer \"MAYBE\"\nmay be returned in the cases of NFC, NFKC, and FCC.\n\nA \"MAYBE\" string should contain at least one combining character or the like. For example,\n\"COMBINING ACUTE ACCENT\" has the MAYBENFC/MAYBENFKC property.\n\nBoth \"checkNFC(\"A\\N{COMBINING ACUTE ACCENT}\")\" and \"checkNFC(\"B\\N{COMBINING ACUTE ACCENT}\")\"\nwill return \"MAYBE\". \"A\\N{COMBINING ACUTE ACCENT}\" is not in NFC (its NFC is \"\\N{LATIN CAPITAL\nLETTER A WITH ACUTE}\"), while \"B\\N{COMBINING ACUTE ACCENT}\" is in NFC.\n\nIf you want to check exactly, compare the string with its NFC/NFKC/FCC.\n\nif ($string eq NFC($string)) {\n# $string is exactly normalized in NFC;\n} else {\n# $string is not normalized in NFC;\n}\n\nif ($string eq NFKC($string)) {\n# $string is exactly normalized in NFKC;\n} else {\n# $string is not normalized in NFKC;\n}\n\n#### Character Data\n\nThese functions are interface of character data used internally. If you want only to get Unicode\nnormalization forms, you don't need call them yourself.\n\n\"$canonicaldecomposition = getCanon($codepoint)\"\nIf the character is canonically decomposable (including Hangul Syllables), it returns the\n(full) canonical decomposition as a string. Otherwise it returns \"undef\".\n\nNote: According to the Unicode standard, the canonical decomposition of the character that\nis not canonically decomposable is same as the character itself.\n\n\"$compatibilitydecomposition = getCompat($codepoint)\"\nIf the character is compatibility decomposable (including Hangul Syllables), it returns the\n(full) compatibility decomposition as a string. Otherwise it returns \"undef\".\n\nNote: According to the Unicode standard, the compatibility decomposition of the character\nthat is not compatibility decomposable is same as the character itself.\n\n\"$codepointcomposite = getComposite($codepointhere, $codepointnext)\"\nIf two characters here and next (as code points) are composable (including Hangul\nJamo/Syllables and Composition Exclusions), it returns the code point of the composite.\n\nIf they are not composable, it returns \"undef\".\n\n\"$combiningclass = getCombinClass($codepoint)\"\nIt returns the combining class (as an integer) of the character.\n\n\"$maybecomposedwithprevchar = isComp2nd($codepoint)\"\nIt returns a boolean whether the character of the specified codepoint may be composed with\nthe previous one in a certain composition (including Hangul Compositions, but excluding\nComposition Exclusions and Non-Starter Decompositions).\n\n\"$isexclusion = isExclusion($codepoint)\"\nIt returns a boolean whether the code point is a composition exclusion.\n\n\"$issingleton = isSingleton($codepoint)\"\nIt returns a boolean whether the code point is a singleton\n\n\"$isnonstarterdecomposition = isNonStDecomp($codepoint)\"\nIt returns a boolean whether the code point has Non-Starter Decomposition.\n\n\"$isFullCompositionExclusion = isCompEx($codepoint)\"\nIt returns a boolean of the derived property CompEx (FullCompositionExclusion). This\nproperty is generated from Composition Exclusions + Singletons + Non-Starter Decompositions.\n\n\"$NFDisNO = isNFDNO($codepoint)\"\nIt returns a boolean of the derived property NFDNO (NFDQuickCheck=No).\n\n\"$NFCisNO = isNFCNO($codepoint)\"\nIt returns a boolean of the derived property NFCNO (NFCQuickCheck=No).\n\n\"$NFCisMAYBE = isNFCMAYBE($codepoint)\"\nIt returns a boolean of the derived property NFCMAYBE (NFCQuickCheck=Maybe).\n\n\"$NFKDisNO = isNFKDNO($codepoint)\"\nIt returns a boolean of the derived property NFKDNO (NFKDQuickCheck=No).\n\n\"$NFKCisNO = isNFKCNO($codepoint)\"\nIt returns a boolean of the derived property NFKCNO (NFKCQuickCheck=No).\n\n\"$NFKCisMAYBE = isNFKCMAYBE($codepoint)\"\nIt returns a boolean of the derived property NFKCMAYBE (NFKCQuickCheck=Maybe).\n\n### EXPORT\n\n\"NFC\", \"NFD\", \"NFKC\", \"NFKD\": by default.\n\n\"normalize\" and other some functions: on request.\n\n### CAVEATS\n\nPerl's version vs. Unicode version\nSince this module refers to perl core's Unicode database in the directory /lib/unicore (or\nformerly /lib/unicode), the Unicode version of normalization implemented by this module\ndepends on what has been compiled into your perl. The following table lists the default\nUnicode version that comes with various perl versions. (It is possible to change the Unicode\nversion in any perl version to be any earlier Unicode version, so one could cause Unicode\n3.2 to be used in any perl version starting with 5.8.0. Read\n$Config{privlib}/unicore/README.perl for details.\n\nperl's version     implemented Unicode version\n5.6.1              3.0.1\n5.7.2              3.1.0\n5.7.3              3.1.1 (normalization is same as 3.1.0)\n5.8.0              3.2.0\n5.8.1-5.8.3      4.0.0\n5.8.4-5.8.6      4.0.1 (normalization is same as 4.0.0)\n5.8.7-5.8.8      4.1.0\n5.10.0             5.0.0\n5.8.9, 5.10.1     5.1.0\n5.12.x             5.2.0\n5.14.x             6.0.0\n5.16.x             6.1.0\n5.18.x             6.2.0\n5.20.x             6.3.0\n5.22.x             7.0.0\n\nCorrection of decomposition mapping\nIn older Unicode versions, a small number of characters (all of which are CJK compatibility\nideographs as far as they have been found) may have an erroneous decomposition mapping (see\nNormalizationCorrections.txt). Anyhow, this module will neither refer to\nNormalizationCorrections.txt nor provide any specific version of normalization. Therefore\nthis module running on an older perl with an older Unicode database may use the erroneous\ndecomposition mapping blindly conforming to the Unicode database.\n\nRevised definition of canonical composition\nIn Unicode 4.1.0, the definition D2 of canonical composition (which affects NFC and NFKC)\nhas been changed (see Public Review Issue #29 and recent UAX #15). This module has used the\nnewer definition since the version 0.07 (Oct 31, 2001). This module will not support the\nnormalization according to the older definition, even if the Unicode version implemented by\nperl is lower than 4.1.0.\n\n### AUTHOR\n\nSADAHIRO Tomoyuki <SADAHIRO@cpan.org>\n\nCurrently maintained by <perl5-porters@perl.org>\n\nCopyright(C) 2001-2012, SADAHIRO Tomoyuki. Japan. All rights reserved.\n\n### LICENSE\n\nThis module is free software; you can redistribute it and/or modify it under the same terms as\nPerl itself.\n\n### SEE ALSO\n\n<http://www.unicode.org/reports/tr15/>\nUnicode Normalization Forms - UAX #15\n\n<http://www.unicode.org/Public/UNIDATA/CompositionExclusions.txt>\nComposition Exclusion Table\n\n<http://www.unicode.org/Public/UNIDATA/DerivedNormalizationProps.txt>\nDerived Normalization Properties\n\n<http://www.unicode.org/Public/UNIDATA/NormalizationCorrections.txt>\nNormalization Corrections\n\n<http://www.unicode.org/review/pr-29.html>\nPublic Review Issue #29: Normalization Issue\n\n<http://www.unicode.org/notes/tn5/>\nCanonical Equivalence in Applications - UTN #5\n\n"
        }
    ],
    "structuredContent": {
        "command": "Unicode::Normalize",
        "section": "",
        "mode": "perldoc",
        "summary": "Unicode::Normalize - Unicode Normalization Forms",
        "synopsis": "(1) using function names exported by default:\nuse Unicode::Normalize;\n$NFDstring  = NFD($string);  # Normalization Form D\n$NFCstring  = NFC($string);  # Normalization Form C\n$NFKDstring = NFKD($string); # Normalization Form KD\n$NFKCstring = NFKC($string); # Normalization Form KC\n(2) using function names exported on request:\nuse Unicode::Normalize 'normalize';\n$NFDstring  = normalize('D',  $string);  # Normalization Form D\n$NFCstring  = normalize('C',  $string);  # Normalization Form C\n$NFKDstring = normalize('KD', $string);  # Normalization Form KD\n$NFKCstring = normalize('KC', $string);  # Normalization Form KC",
        "tldr_summary": null,
        "tldr_examples": [],
        "tldr_source": null,
        "flags": [],
        "examples": [],
        "see_also": [],
        "section_outline": [
            {
                "name": "NAME",
                "lines": 2,
                "subsections": []
            },
            {
                "name": "SYNOPSIS",
                "lines": 18,
                "subsections": []
            },
            {
                "name": "DESCRIPTION",
                "lines": 10,
                "subsections": [
                    {
                        "name": "Normalization Forms",
                        "lines": 39
                    },
                    {
                        "name": "Decomposition and Composition",
                        "lines": 98
                    },
                    {
                        "name": "Quick Check",
                        "lines": 70
                    },
                    {
                        "name": "Character Data",
                        "lines": 62
                    }
                ]
            },
            {
                "name": "EXPORT",
                "lines": 4,
                "subsections": []
            },
            {
                "name": "CAVEATS",
                "lines": 41,
                "subsections": []
            },
            {
                "name": "AUTHOR",
                "lines": 6,
                "subsections": []
            },
            {
                "name": "LICENSE",
                "lines": 3,
                "subsections": []
            },
            {
                "name": "SEE ALSO",
                "lines": 18,
                "subsections": []
            }
        ]
    }
}