{
    "content": [
        {
            "type": "text",
            "text": "# HTML::Scrubber (perldoc)\n\n## NAME\n\nHTML::Scrubber - Perl extension for scrubbing/sanitizing HTML\n\n## SYNOPSIS\n\nuse HTML::Scrubber;\nmy $scrubber = HTML::Scrubber->new( allow => [ qw[ p b i u hr br ] ] );\nprint $scrubber->scrub('<p><b>bold</b> <em>missing</em></p>');\n# output is: <p><b>bold</b> </p>\n# more complex input\nmy $html = q[\n<style type=\"text/css\"> BAD { background: #666; color: #666;} </style>\n<script language=\"javascript\"> alert(\"Hello, I am EVIL!\");    </script>\n<HR>\na   => <a href=1>link </a>\nbr  => <br>\nb   => <B> bold </B>\nu   => <U> UNDERLINE </U>\n];\nprint $scrubber->scrub($html);\n$scrubber->deny( qw[ p b i u hr br ] );\nprint $scrubber->scrub($html);\n\n## DESCRIPTION\n\nIf you want to \"scrub\" or \"sanitize\" html input in a reliable and flexible fashion, then this\nmodule is for you.\n\n## Sections\n\n- **NAME**\n- **VERSION**\n- **SYNOPSIS**\n- **DESCRIPTION**\n- **METHODS**\n- **SEE ALSO**\n- **VERSION REQUIREMENTS**\n- **CONTRIBUTING**\n- **AUTHORS**\n- **COPYRIGHT AND LICENSE**\n- **SUPPORT** (3 subsections)\n- **CONTRIBUTORS**\n\nUse structuredContent.sections for detailed options, examples, and full documentation.\n"
        }
    ],
    "structuredContent": {
        "command": "HTML::Scrubber",
        "section": "",
        "mode": "perldoc",
        "summary": "HTML::Scrubber - Perl extension for scrubbing/sanitizing HTML",
        "synopsis": "use HTML::Scrubber;\nmy $scrubber = HTML::Scrubber->new( allow => [ qw[ p b i u hr br ] ] );\nprint $scrubber->scrub('<p><b>bold</b> <em>missing</em></p>');\n# output is: <p><b>bold</b> </p>\n# more complex input\nmy $html = q[\n<style type=\"text/css\"> BAD { background: #666; color: #666;} </style>\n<script language=\"javascript\"> alert(\"Hello, I am EVIL!\");    </script>\n<HR>\na   => <a href=1>link </a>\nbr  => <br>\nb   => <B> bold </B>\nu   => <U> UNDERLINE </U>\n];\nprint $scrubber->scrub($html);\n$scrubber->deny( qw[ p b i u hr br ] );\nprint $scrubber->scrub($html);",
        "tldr_summary": null,
        "tldr_examples": [],
        "tldr_source": null,
        "flags": [],
        "examples": [],
        "see_also": [],
        "section_outline": [
            {
                "name": "NAME",
                "lines": 2,
                "subsections": []
            },
            {
                "name": "VERSION",
                "lines": 2,
                "subsections": []
            },
            {
                "name": "SYNOPSIS",
                "lines": 23,
                "subsections": []
            },
            {
                "name": "DESCRIPTION",
                "lines": 6,
                "subsections": []
            },
            {
                "name": "METHODS",
                "lines": 241,
                "subsections": []
            },
            {
                "name": "SEE ALSO",
                "lines": 4,
                "subsections": []
            },
            {
                "name": "VERSION REQUIREMENTS",
                "lines": 10,
                "subsections": []
            },
            {
                "name": "CONTRIBUTING",
                "lines": 11,
                "subsections": []
            },
            {
                "name": "AUTHORS",
                "lines": 6,
                "subsections": []
            },
            {
                "name": "COPYRIGHT AND LICENSE",
                "lines": 5,
                "subsections": []
            },
            {
                "name": "SUPPORT",
                "lines": 1,
                "subsections": [
                    {
                        "name": "Perldoc",
                        "lines": 4
                    },
                    {
                        "name": "Websites",
                        "lines": 66
                    },
                    {
                        "name": "Source Code",
                        "lines": 8
                    }
                ]
            },
            {
                "name": "CONTRIBUTORS",
                "lines": 18,
                "subsections": []
            }
        ],
        "sections": {
            "NAME": {
                "content": "HTML::Scrubber - Perl extension for scrubbing/sanitizing HTML\n",
                "subsections": []
            },
            "VERSION": {
                "content": "version 0.19\n",
                "subsections": []
            },
            "SYNOPSIS": {
                "content": "use HTML::Scrubber;\n\nmy $scrubber = HTML::Scrubber->new( allow => [ qw[ p b i u hr br ] ] );\nprint $scrubber->scrub('<p><b>bold</b> <em>missing</em></p>');\n# output is: <p><b>bold</b> </p>\n\n# more complex input\nmy $html = q[\n<style type=\"text/css\"> BAD { background: #666; color: #666;} </style>\n<script language=\"javascript\"> alert(\"Hello, I am EVIL!\");    </script>\n<HR>\na   => <a href=1>link </a>\nbr  => <br>\nb   => <B> bold </B>\nu   => <U> UNDERLINE </U>\n];\n\nprint $scrubber->scrub($html);\n\n$scrubber->deny( qw[ p b i u hr br ] );\n\nprint $scrubber->scrub($html);\n",
                "subsections": []
            },
            "DESCRIPTION": {
                "content": "If you want to \"scrub\" or \"sanitize\" html input in a reliable and flexible fashion, then this\nmodule is for you.\n\nI wasn't satisfied with HTML::Sanitizer because it is based on HTML::TreeBuilder, so I thought\nI'd write something similar that works directly with HTML::Parser.\n",
                "subsections": []
            },
            "METHODS": {
                "content": "First a note on documentation: just study the EXAMPLE below. It's all the documentation you\ncould need.\n\nAlso, be sure to read all the comments as well as How does it work?.\n\nIf you're new to perl, good luck to you.\n\nnew\nmy $scrubber = HTML::Scrubber->new( allow => [ qw[ p b i u hr br ] ] );\n\nBuild a new HTML::Scrubber. The arguments are the initial values for the following directives:-\n\n*   default\n\n*   allow\n\n*   deny\n\n*   rules\n\n*   process\n\n*   comment\n\ncomment\nwarn \"comments are  \", $p->comment ? 'allowed' : 'not allowed';\n$p->comment(0);  # off by default\n\nprocess\nwarn \"process instructions are  \", $p->process ? 'allowed' : 'not allowed';\n$p->process(0);  # off by default\n\nscript\nwarn \"script tags (and everything in between) are supressed\"\nif $p->script;      # off by default\n$p->script( 0 || 1 );\n\nPlease note that this is implemented using HTML::Parser's \"ignoreelements\" function, so if\n\"script\" is set to true, all script tags encountered will be validated like all other tags.\n\nstyle\nwarn \"style tags (and everything in between) are supressed\"\nif $p->style;       # off by default\n$p->style( 0 || 1 );\n\nPlease note that this is implemented using HTML::Parser's \"ignoreelements\" function, so if\n\"style\" is set to true, all style tags encountered will be validated like all other tags.\n\nallow\n$p->allow(qw[ t a g s ]);\n\ndeny\n$p->deny(qw[ t a g s ]);\n\nrules\n$p->rules(\nimg => {\nsrc => qr{^(?!http://)}i, # only relative image links allowed\nalt => 1,                 # alt attribute allowed\n'*' => 0,                 # deny all other attributes\n},\na => {\nhref => sub { ... },      # check or adjust with a callback\n},\nb => 1,\n...\n);\n\nUpdates a set of attribute rules. Each rule can be 1/0, a regular expression or a callback.\nValues longer than 1 char are treated as regexps. The callback is called with the following\narguments: the current object, tag name, attribute name, and attribute value; the callback\nshould return an empty list to drop the attribute, \"undef\" to keep it without a value, or a new\nscalar value.\n\ndefault\nprint \"default is \", $p->default();\n$p->default(1);      # allow tags by default\n$p->default(\nundef,           # don't change\n{                # default attribute rules\n'*' => 1,    # allow attributes by default\n}\n);\n\nscrubfile\n$html = $scrubber->scrubfile('foo.html');   ## returns giant string\ndie \"Eeek $!\" unless defined $html;  ## opening foo.html may have failed\n$scrubber->scrubfile('foo.html', 'new.html') or die \"Eeek $!\";\n$scrubber->scrubfile('foo.html', *STDOUT)\nor die \"Eeek $!\"\nif fileno STDOUT;\n\nscrub\nprint $scrubber->scrub($html);  ## returns giant string\n$scrubber->scrub($html, 'new.html') or die \"Eeek $!\";\n$scrubber->scrub($html', *STDOUT)\nor die \"Eeek $!\"\nif fileno STDOUT;\n\n*default* handler, used by both \"scrub\" and \"scrubfh\". Moved all the common code (basically\nall of it) into a single routine for ease of maintenance.\n\n*default* handler, does the scrubbing if we're scrubbing out to a file. Now calls \"scrubstr\"\nand pushes that out to a file.\n\n*default* handler, does the scrubbing if we're returning a giant string. Now calls \"scrubstr\"\nand appends that to the output string.\n\nHow does it work?\nWhen a tag is encountered, HTML::Scrubber allows/denies the tag using the explicit rule if one\nexists.\n\nIf no explicit rule exists, Scrubber applies the default rule.\n\nIf an explicit rule exists, but it's a simple rule(1), then the default attribute rule is\napplied.\n\nEXAMPLE\n#!/usr/bin/perl -w\nuse HTML::Scrubber;\nuse strict;\n\nmy @allow = qw[ br hr b a ];\n\nmy @rules = (\nscript => 0,\nimg    => {\nsrc => qr{^(?!http://)}i,    # only relative image links allowed\nalt => 1,                    # alt attribute allowed\n'*' => 0,                    # deny all other attributes\n},\n);\n\nmy @default = (\n0 =>                             # default rule, deny all tags\n{\n'*'    => 1,                             # default rule, allow all attributes\n'href' => qr{^(?:http|https|ftp)://}i,\n'src'  => qr{^(?:http|https|ftp)://}i,\n\n#   If your perl doesn't have qr\n#   just use a string with length greater than 1\n'cite'        => '(?i-xsm:^(?:http|https|ftp):)',\n'language'    => 0,\n'name'        => 1,                                 # could be sneaky, but hey ;)\n'onblur'      => 0,\n'onchange'    => 0,\n'onclick'     => 0,\n'ondblclick'  => 0,\n'onerror'     => 0,\n'onfocus'     => 0,\n'onkeydown'   => 0,\n'onkeypress'  => 0,\n'onkeyup'     => 0,\n'onload'      => 0,\n'onmousedown' => 0,\n'onmousemove' => 0,\n'onmouseout'  => 0,\n'onmouseover' => 0,\n'onmouseup'   => 0,\n'onreset'     => 0,\n'onselect'    => 0,\n'onsubmit'    => 0,\n'onunload'    => 0,\n'src'         => 0,\n'type'        => 0,\n}\n);\n\nmy $scrubber = HTML::Scrubber->new();\n$scrubber->allow(@allow);\n$scrubber->rules(@rules);    # key/value pairs\n$scrubber->default(@default);\n$scrubber->comment(1);       # 1 allow, 0 deny\n\n## preferred way to create the same object\n$scrubber = HTML::Scrubber->new(\nallow   => \\@allow,\nrules   => \\@rules,\ndefault => \\@default,\ncomment => 1,\nprocess => 0,\n);\n\nrequire Data::Dumper, die Data::Dumper::Dumper($scrubber) if @ARGV;\n\nmy $it = q[\n<?php   echo(\" EVIL EVIL EVIL \"); ?>    <!-- asdf -->\n<hr>\n<I FAKE=\"attribute\" > IN ITALICS WITH FAKE=\"attribute\" </I><br>\n<B> IN BOLD </B><br>\n<A NAME=\"evil\">\n<A HREF=\"javascript:alert('die die die');\">HREF=JAVA &lt;!&gt;</A>\n<br>\n<A HREF=\"image/bigone.jpg\" ONMOUSEOVER=\"alert('die die die');\">\n<IMG SRC=\"image/smallone.jpg\" ALT=\"ONMOUSEOVER JAVASCRIPT\">\n</A>\n</A> <br>\n];\n\nprint \"#original text\", $/, $it, $/;\nprint\n\"#scrubbed text (default \", $scrubber->default(),    # no arguments returns the current value\n\" comment \", $scrubber->comment(), \" process \", $scrubber->process(), \" )\", $/, $scrubber->scrub($it), $/;\n\n$scrubber->default(1);                                   # allow all tags by default\n$scrubber->comment(0);                                   # deny comments\n\nprint\n\"#scrubbed text (default \",\n$scrubber->default(),\n\" comment \",\n$scrubber->comment(),\n\" process \",\n$scrubber->process(),\n\" )\", $/,\n$scrubber->scrub($it),\n$/;\n\n$scrubber->process(1);    # allow process instructions (dangerous)\n$default[0] = 1;          # allow all tags by default\n$default[1]->{'*'} = 0;   # deny all attributes by default\n$scrubber->default(@default);    # set the default again\n\nprint\n\"#scrubbed text (default \",\n$scrubber->default(),\n\" comment \",\n$scrubber->comment(),\n\" process \",\n$scrubber->process(),\n\" )\", $/,\n$scrubber->scrub($it),\n$/;\n\nFUN\nIf you have Test::Inline (and you've installed HTML::Scrubber), try\n\npod2test Scrubber.pm >scrubber.t\nperl scrubber.t\n",
                "subsections": []
            },
            "SEE ALSO": {
                "content": "HTML::Parser, Test::Inline.\n\nThe HTML::Sanitizer module is no longer available on CPAN.\n",
                "subsections": []
            },
            "VERSION REQUIREMENTS": {
                "content": "As of version 0.14 I have added a perl minimum version requirement of 5.8. This is basically due\nto failures on the smokers perl 5.6 installations - which appears to be down to installation\nmechanisms and requirements.\n\nSince I don't want to spend the time supporting a version that is so old (and may not work for\nreasons on UTF support etc), I have added a \"use 5.008;\" to the main module.\n\nIf this is problematic I am very willing to accept patches to fix this up, although I do not\npersonally see a good reason to support a release that has been obsolete for 13 years.\n",
                "subsections": []
            },
            "CONTRIBUTING": {
                "content": "If you want to contribute to the development of this module, the code is on GitHub\n<http://github.com/nigelm/html-scrubber>. You'll need a perl environment with Dist::Zilla, and\nif you're just getting started, there's some documentation on using Vagrant and Perlbrew here\n<http://mrcaron.github.io/2015/03/06/Perl-CPAN-Pull-Request.html>.\n\nThere is now a \".perltidyrc\" and a \".tidyallrc\" file within the repository for the standard\nperltidy settings used - I will apply these before new releases. Please do not let formatting\nprevent you from sending in patches etc - this can be sorted out as part of the release process.\nInfo on \"tidyall\" can be found at\n<https://metacpan.org/pod/distribution/Code-TidyAll/bin/tidyall>.\n",
                "subsections": []
            },
            "AUTHORS": {
                "content": "*   Ruslan Zakirov <Ruslan.Zakirov@gmail.com>\n\n*   Nigel Metheringham <nigelm@cpan.org>\n\n*   D. H. <podmaster@cpan.org>\n",
                "subsections": []
            },
            "COPYRIGHT AND LICENSE": {
                "content": "This software is copyright (c) 2018 by Ruslan Zakirov, Nigel Metheringham, 2003-2004 D. H.\n\nThis is free software; you can redistribute it and/or modify it under the same terms as the Perl\n5 programming language system itself.\n",
                "subsections": []
            },
            "SUPPORT": {
                "content": "",
                "subsections": [
                    {
                        "name": "Perldoc",
                        "content": "You can find documentation for this module with the perldoc command.\n\nperldoc HTML::Scrubber\n"
                    },
                    {
                        "name": "Websites",
                        "content": "The following websites have more information about this module, and may be of help to you. As\nalways, in addition to those websites please use your favorite search engine to discover more\nresources.\n\n*   MetaCPAN\n\nA modern, open-source CPAN search engine, useful to view POD in HTML format.\n\n<https://metacpan.org/release/HTML-Scrubber>\n\n*   Search CPAN\n\nThe default CPAN search engine, useful to view POD in HTML format.\n\n<http://search.cpan.org/dist/HTML-Scrubber>\n\n*   RT: CPAN's Bug Tracker\n\nThe RT ( Request Tracker ) website is the default bug/issue tracking system for CPAN.\n\n<https://rt.cpan.org/Public/Dist/Display.html?Name=HTML-Scrubber>\n\n*   AnnoCPAN\n\nThe AnnoCPAN is a website that allows community annotations of Perl module documentation.\n\n<http://annocpan.org/dist/HTML-Scrubber>\n\n*   CPAN Ratings\n\nThe CPAN Ratings is a website that allows community ratings and reviews of Perl modules.\n\n<http://cpanratings.perl.org/d/HTML-Scrubber>\n\n*   CPANTS\n\nThe CPANTS is a website that analyzes the Kwalitee ( code metrics ) of a distribution.\n\n<http://cpants.cpanauthors.org/dist/HTML-Scrubber>\n\n*   CPAN Testers\n\nThe CPAN Testers is a network of smoke testers who run automated tests on uploaded CPAN\ndistributions.\n\n<http://www.cpantesters.org/distro/H/HTML-Scrubber>\n\n*   CPAN Testers Matrix\n\nThe CPAN Testers Matrix is a website that provides a visual overview of the test results for\na distribution on various Perls/platforms.\n\n<http://matrix.cpantesters.org/?dist=HTML-Scrubber>\n\n*   CPAN Testers Dependencies\n\nThe CPAN Testers Dependencies is a website that shows a chart of the test results of all\ndependencies for a distribution.\n\n<http://deps.cpantesters.org/?module=HTML::Scrubber>\n\nBugs / Feature Requests\nPlease report any bugs or feature requests by email to \"bug-html-scrubber at rt.cpan.org\", or\nthrough the web interface at <https://rt.cpan.org/Public/Bug/Report.html?Queue=HTML-Scrubber>.\nYou will be automatically notified of any progress on the request by the system.\n"
                    },
                    {
                        "name": "Source Code",
                        "content": "The code is open to the world, and available for you to hack on. Please feel free to browse it\nand play with it, or whatever. If you want to contribute patches, please send me a diff or prod\nme to pull from your repository :)\n\n<https://github.com/nigelm/html-scrubber>\n\ngit clone https://github.com/nigelm/html-scrubber.git\n"
                    }
                ]
            },
            "CONTRIBUTORS": {
                "content": "*   Andrei Vereha <avereha@gmail.com>\n\n*   Lee Johnson <lee@givengain.ch>\n\n*   Michael Caron <michael.r.caron@gmail.com>\n\n*   Michael Caron <mrcaron@users.noreply.github.com>\n\n*   Nigel Metheringham <nm9762github@muesli.org.uk>\n\n*   Paul Cochrane <paul@liekut.de>\n\n*   Ruslan Zakirov <ruz@bestpractical.com>\n\n*   Sergey Romanov <complefor@rambler.ru>\n\n*   vagrant <vagrant@precise64.(none)>\n",
                "subsections": []
            }
        }
    }
}