{
    "mode": "perldoc",
    "parameter": "WWW::RobotRules",
    "section": "",
    "url": "https://www.chedong.com/phpMan.php/perldoc/WWW%3A%3ARobotRules/json",
    "generated": "2026-06-11T17:11:47Z",
    "synopsis": "use WWW::RobotRules;\nmy $rules = WWW::RobotRules->new('MOMspider/1.0');\nuse LWP::Simple qw(get);\n{\nmy $url = \"http://some.place/robots.txt\";\nmy $robotstxt = get $url;\n$rules->parse($url, $robotstxt) if defined $robotstxt;\n}\n{\nmy $url = \"http://some.other.place/robots.txt\";\nmy $robotstxt = get $url;\n$rules->parse($url, $robotstxt) if defined $robotstxt;\n}\n# Now we can check if a URL is valid for those servers\n# whose \"robots.txt\" files we've gotten and parsed:\nif($rules->allowed($url)) {\n$c = get $url;\n...\n}",
    "sections": {
        "NAME": {
            "content": "WWW::RobotRules - database of robots.txt-derived permissions\n",
            "subsections": []
        },
        "SYNOPSIS": {
            "content": "use WWW::RobotRules;\nmy $rules = WWW::RobotRules->new('MOMspider/1.0');\n\nuse LWP::Simple qw(get);\n\n{\nmy $url = \"http://some.place/robots.txt\";\nmy $robotstxt = get $url;\n$rules->parse($url, $robotstxt) if defined $robotstxt;\n}\n\n{\nmy $url = \"http://some.other.place/robots.txt\";\nmy $robotstxt = get $url;\n$rules->parse($url, $robotstxt) if defined $robotstxt;\n}\n\n# Now we can check if a URL is valid for those servers\n# whose \"robots.txt\" files we've gotten and parsed:\nif($rules->allowed($url)) {\n$c = get $url;\n...\n}\n",
            "subsections": []
        },
        "DESCRIPTION": {
            "content": "This module parses /robots.txt files as specified in \"A Standard for Robot Exclusion\", at\n<http://www.robotstxt.org/wc/norobots.html> Webmasters can use the /robots.txt file to forbid\nconforming robots from accessing parts of their web site.\n\nThe parsed files are kept in a WWW::RobotRules object, and this object provides methods to check\nif access to a given URL is prohibited. The same WWW::RobotRules object can be used for one or\nmore parsed /robots.txt files on any number of hosts.\n\nThe following methods are provided:\n\n$rules = WWW::RobotRules->new($robotname)\nThis is the constructor for WWW::RobotRules objects. The first argument given to new() is\nthe name of the robot.\n\n$rules->parse($robottxturl, $content, $freshuntil)\nThe parse() method takes as arguments the URL that was used to retrieve the /robots.txt\nfile, and the contents of the file.\n\n$rules->allowed($uri)\nReturns TRUE if this robot is allowed to retrieve this URL.\n\n$rules->agent([$name])\nGet/set the agent name. NOTE: Changing the agent name will clear the robots.txt rules and\nexpire times out of the cache.\n\nROBOTS.TXT\nThe format and semantics of the \"/robots.txt\" file are as follows (this is an edited abstract of\n<http://www.robotstxt.org/wc/norobots.html>):\n\nThe file consists of one or more records separated by one or more blank lines. Each record\ncontains lines of the form\n\n<field-name>: <value>\n\nThe field name is case insensitive. Text after the '#' character on a line is ignored during\nparsing. This is used for comments. The following <field-names> can be used:\n\nUser-Agent\nThe value of this field is the name of the robot the record is describing access policy for.\nIf more than one *User-Agent* field is present the record describes an identical access\npolicy for more than one robot. At least one field needs to be present per record. If the\nvalue is '*', the record describes the default access policy for any robot that has not not\nmatched any of the other records.\n\nThe *User-Agent* fields must occur before the *Disallow* fields. If a record contains a\n*User-Agent* field after a *Disallow* field, that constitutes a malformed record. This parser\nwill assume that a blank line should have been placed before that *User-Agent* field, and\nwill break the record into two. All the fields before the *User-Agent* field will constitute\na record, and the *User-Agent* field will be the first field in a new record.\n\nDisallow\nThe value of this field specifies a partial URL that is not to be visited. This can be a full\npath, or a partial path; any URL that starts with this value will not be retrieved\n\nUnrecognized records are ignored.\n\nROBOTS.TXT EXAMPLES\nThe following example \"/robots.txt\" file specifies that no robots should visit any URL starting\nwith \"/cyberworld/map/\" or \"/tmp/\":\n\nUser-agent: *\nDisallow: /cyberworld/map/ # This is an infinite virtual URL space\nDisallow: /tmp/ # these will soon disappear\n\nThis example \"/robots.txt\" file specifies that no robots should visit any URL starting with\n\"/cyberworld/map/\", except the robot called \"cybermapper\":\n\nUser-agent: *\nDisallow: /cyberworld/map/ # This is an infinite virtual URL space\n\n# Cybermapper knows where to go.\nUser-agent: cybermapper\nDisallow:\n\nThis example indicates that no robots should visit this site further:\n\n# go away\nUser-agent: *\nDisallow: /\n\nThis is an example of a malformed robots.txt file.\n\n# robots.txt for ancientcastle.example.com\n# I've locked myself away.\nUser-agent: *\nDisallow: /\n# The castle is your home now, so you can go anywhere you like.\nUser-agent: Belle\nDisallow: /west-wing/ # except the west wing!\n# It's good to be the Prince...\nUser-agent: Beast\nDisallow:\n\nThis file is missing the required blank lines between records. However, the intention is clear.\n",
            "subsections": []
        },
        "SEE ALSO": {
            "content": "LWP::RobotUA, WWW::RobotRules::AnyDBMFile\n",
            "subsections": []
        },
        "COPYRIGHT": {
            "content": "Copyright 1995-2009, Gisle Aas\nCopyright 1995, Martijn Koster\n\nThis library is free software; you can redistribute it and/or modify it under the same terms as\nPerl itself.\n",
            "subsections": []
        }
    },
    "summary": "WWW::RobotRules - database of robots.txt-derived permissions",
    "flags": [],
    "examples": [],
    "see_also": []
}