phpman > man > XML::DOM

Che Dong
NAME
    XML::DOM - A perl module for building DOM Level 1 compliant document structures

SYNOPSIS
     use XML::DOM;

     my $parser = new XML::DOM::Parser;
     my $doc = $parser->parsefile ("file.xml");

     # print all HREF attributes of all CODEBASE elements
     my $nodes = $doc->getElementsByTagName ("CODEBASE");
     my $n = $nodes->getLength;

     for (my $i = 0; $i < $n; $i++)
     {
         my $node = $nodes->item ($i);
         my $href = $node->getAttributeNode ("HREF");
         print $href->getValue . "\n";
     }

     # Print doc file
     $doc->printToFile ("out.xml");

     # Print to string
     print $doc->toString;

     # Avoid memory leaks - cleanup circular references for garbage collection
     $doc->dispose;

DESCRIPTION
    This module extends the XML::Parser module by Clark Cooper. The XML::Parser module is built on
    top of XML::Parser::Expat, which is a lower level interface to James Clark's expat library.

    XML::DOM::Parser is derived from XML::Parser. It parses XML strings or files and builds a data
    structure that conforms to the API of the Document Object Model as described at
    http://www.w3.org/TR/REC-DOM-Level-1. See the XML::Parser manpage for other available features
    of the XML::DOM::Parser class. Note that the 'Style' property should not be used (it is set
    internally.)

    The XML::Parser *NoExpand* option is more or less supported, in that it will generate
    EntityReference objects whenever an entity reference is encountered in character data. I'm not
    sure how useful this is. Any comments are welcome.

    As described in the synopsis, when you create an XML::DOM::Parser object, the parse and
    parsefile methods create an *XML::DOM::Document* object from the specified input. This Document
    object can then be examined, modified and written back out to a file or converted to a string.

    When using XML::DOM with XML::Parser version 2.19 and up, setting the XML::DOM::Parser option
    *KeepCDATA* to 1 will store CDATASections in CDATASection nodes, instead of converting them to
    Text nodes. Subsequent CDATASection nodes will be merged into one. Let me know if this is a
    problem.

    When using XML::Parser 2.27 and above, you can suppress expansion of parameter entity references
    (e.g. %pent;) in the DTD, by setting *ParseParamEnt* to 1 and *ExpandParamEnt* to 0. See Hidden
    Nodes for details.

    A Document has a tree structure consisting of *Node* objects. A Node may contain other nodes,
    depending on its type. A Document may have Element, Text, Comment, and CDATASection nodes.
    Element nodes may have Attr, Element, Text, Comment, and CDATASection nodes. The other nodes may
    not have any child nodes.

    This module adds several node types that are not part of the DOM spec (yet.) These are:
    ElementDecl (for <!ELEMENT ...> declarations), AttlistDecl (for <!ATTLIST ...> declarations),
    XMLDecl (for <?xml ...?> declarations) and AttDef (for attribute definitions in an AttlistDecl.)

XML::DOM Classes
    The XML::DOM module stores XML documents in a tree structure with a root node of type
    XML::DOM::Document. Different nodes in tree represent different parts of the XML file. The DOM
    Level 1 Specification defines the following node types:

    *   XML::DOM::Node - Super class of all node types

    *   XML::DOM::Document - The root of the XML document

    *   XML::DOM::DocumentType - Describes the document structure: <!DOCTYPE root [ ... ]>

    *   XML::DOM::Element - An XML element: <elem attr="val"> ... </elem>

    *   XML::DOM::Attr - An XML element attribute: name="value"

    *   XML::DOM::CharacterData - Super class of Text, Comment and CDATASection

    *   XML::DOM::Text - Text in an XML element

    *   XML::DOM::CDATASection - Escaped block of text: <![CDATA[ text ]]>

    *   XML::DOM::Comment - An XML comment: <!-- comment -->

    *   XML::DOM::EntityReference - Refers to an ENTITY: &ent; or %ent;

    *   XML::DOM::Entity - An ENTITY definition: <!ENTITY ...>

    *   XML::DOM::ProcessingInstruction - <?PI target>

    *   XML::DOM::DocumentFragment - Lightweight node for cut & paste

    *   XML::DOM::Notation - An NOTATION definition: <!NOTATION ...>

    In addition, the XML::DOM module contains the following nodes that are not part of the DOM Level
    1 Specification:

    *   XML::DOM::ElementDecl - Defines an element: <!ELEMENT ...>

    *   XML::DOM::AttlistDecl - Defines one or more attributes in an <!ATTLIST ...>

    *   XML::DOM::AttDef - Defines one attribute in an <!ATTLIST ...>

    *   XML::DOM::XMLDecl - An XML declaration: <?xml version="1.0" ...>

    Other classes that are part of the DOM Level 1 Spec:

    *   XML::DOM::Implementation - Provides information about this implementation. Currently it
        doesn't do much.

    *   XML::DOM::NodeList - Used internally to store a node's child nodes. Also returned by
        getElementsByTagName.

    *   XML::DOM::NamedNodeMap - Used internally to store an element's attributes.

    Other classes that are not part of the DOM Level 1 Spec:

    *   XML::DOM::Parser - An non-validating XML parser that creates XML::DOM::Documents

    *   XML::DOM::ValParser - A validating XML parser that creates XML::DOM::Documents. It uses
        XML::Checker to check against the DocumentType (DTD)

    *   XML::Handler::BuildDOM - A PerlSAX handler that creates XML::DOM::Documents.

XML::DOM package
    Constant definitions
        The following predefined constants indicate which type of node it is.

     UNKNOWN_NODE (0)                The node type is unknown (not part of DOM)

     ELEMENT_NODE (1)                The node is an Element.
     ATTRIBUTE_NODE (2)              The node is an Attr.
     TEXT_NODE (3)                   The node is a Text node.
     CDATA_SECTION_NODE (4)          The node is a CDATASection.
     ENTITY_REFERENCE_NODE (5)       The node is an EntityReference.
     ENTITY_NODE (6)                 The node is an Entity.
     PROCESSING_INSTRUCTION_NODE (7) The node is a ProcessingInstruction.
     COMMENT_NODE (8)                The node is a Comment.
     DOCUMENT_NODE (9)               The node is a Document.
     DOCUMENT_TYPE_NODE (10)         The node is a DocumentType.
     DOCUMENT_FRAGMENT_NODE (11)     The node is a DocumentFragment.
     NOTATION_NODE (12)              The node is a Notation.

     ELEMENT_DECL_NODE (13)          The node is an ElementDecl (not part of DOM)
     ATT_DEF_NODE (14)               The node is an AttDef (not part of DOM)
     XML_DECL_NODE (15)              The node is an XMLDecl (not part of DOM)
     ATTLIST_DECL_NODE (16)          The node is an AttlistDecl (not part of DOM)

     Usage:

       if ($node->getNodeType == ELEMENT_NODE)
       {
           print "It's an Element";
       }

    Not In DOM Spec: The DOM Spec does not mention UNKNOWN_NODE and, quite frankly, you should never
    encounter it. The last 4 node types were added to support the 4 added node classes.

  Global Variables
    $VERSION
        The variable $XML::DOM::VERSION contains the version number of this implementation, e.g.
        "1.43".

  METHODS
    These methods are not part of the DOM Level 1 Specification.

    getIgnoreReadOnly and ignoreReadOnly (readOnly)
        The DOM Level 1 Spec does not allow you to edit certain sections of the document, e.g. the
        DocumentType, so by default this implementation throws DOMExceptions (i.e.
        NO_MODIFICATION_ALLOWED_ERR) when you try to edit a readonly node. These readonly checks can
        be disabled by (temporarily) setting the global IgnoreReadOnly flag.

        The ignoreReadOnly method sets the global IgnoreReadOnly flag and returns its previous
        value. The getIgnoreReadOnly method simply returns its current value.

         my $oldIgnore = XML::DOM::ignoreReadOnly (1);
         eval {
         ... do whatever you want, catching any other exceptions ...
         };
         XML::DOM::ignoreReadOnly ($oldIgnore);     # restore previous value

        Another way to do it, using a local variable:

         { # start new scope
            local $XML::DOM::IgnoreReadOnly = 1;
            ... do whatever you want, don't worry about exceptions ...
         } # end of scope ($IgnoreReadOnly is set back to its previous value)

    isValidName (name)
        Whether the specified name is a valid "Name" as specified in the XML spec. Characters with
        Unicode values > 127 are now also supported.

    getAllowReservedNames and allowReservedNames (boolean)
        The first method returns whether reserved names are allowed. The second takes a boolean
        argument and sets whether reserved names are allowed. The initial value is 1 (i.e. allow
        reserved names.)

        The XML spec states that "Names" starting with (X|x)(M|m)(L|l) are reserved for future use.
        (Amusingly enough, the XML version of the XML spec (REC-xml-19980210.xml) breaks that very
        rule by defining an ENTITY with the name 'xmlpio'.) A "Name" in this context means the Name
        token as found in the BNF rules in the XML spec.

        XML::DOM only checks for errors when you modify the DOM tree, not when the DOM tree is built
        by the XML::DOM::Parser.

    setTagCompression (funcref)
        There are 3 possible styles for printing empty Element tags:

        Style 0
             <empty/> or <empty attr="val"/>

            XML::DOM uses this style by default for all Elements.

        Style 1
              <empty></empty> or <empty attr="val"></empty>

        Style 2
              <empty /> or <empty attr="val" />

            This style is sometimes desired when using XHTML. (Note the extra space before the slash
            "/") See <http://www.w3.org/TR/xhtml1> Appendix C for more details.

        By default XML::DOM compresses all empty Element tags (style 0.) You can control which style
        is used for a particular Element by calling XML::DOM::setTagCompression with a reference to
        a function that takes 2 arguments. The first is the tag name of the Element, the second is
        the XML::DOM::Element that is being printed. The function should return 0, 1 or 2 to
        indicate which style should be used to print the empty tag. E.g.

         XML::DOM::setTagCompression (\&my_tag_compression);

         sub my_tag_compression
         {
            my ($tag, $elem) = @_;

            # Print empty br, hr and img tags like this: <br />
            return 2 if $tag =~ /^(br|hr|img)$/;

            # Print other empty tags like this: <empty></empty>
            return 1;
         }

IMPLEMENTATION DETAILS
    *   Perl Mappings

        The value undef was used when the DOM Spec said null.

        The DOM Spec says: Applications must encode DOMString using UTF-16 (defined in Appendix C.3
        of [UNICODE] and Amendment 1 of [ISO-10646]). In this implementation we use plain old Perl
        strings encoded in UTF-8 instead of UTF-16.

    *   Text and CDATASection nodes

        The Expat parser expands EntityReferences and CDataSection sections to raw strings and does
        not indicate where it was found. This implementation does therefore convert both to Text
        nodes at parse time. CDATASection and EntityReference nodes that are added to an existing
        Document (by the user) will be preserved.

        Also, subsequent Text nodes are always merged at parse time. Text nodes that are added later
        can be merged with the normalize method. Consider using the addText method when adding Text
        nodes.

    *   Printing and toString

        When printing (and converting an XML Document to a string) the strings have to encoded
        differently depending on where they occur. E.g. in a CDATASection all substrings are allowed
        except for "]]>". In regular text, certain characters are not allowed, e.g. ">" has to be
        converted to "&gt;". These routines should be verified by someone who knows the details.

    *   Quotes

        Certain sections in XML are quoted, like attribute values in an Element. XML::Parser strips
        these quotes and the print methods in this implementation always uses double quotes, so when
        parsing and printing a document, single quotes may be converted to double quotes. The
        default value of an attribute definition (AttDef) in an AttlistDecl, however, will maintain
        its quotes.

    *   AttlistDecl

        Attribute declarations for a certain Element are always merged into a single AttlistDecl
        object.

    *   Comments

        Comments in the DOCTYPE section are not kept in the right place. They will become child
        nodes of the Document.

    *   Hidden Nodes

        Previous versions of XML::DOM would expand parameter entity references (like %pent;), so
        when printing the DTD, it would print the contents of the external entity, instead of the
        parameter entity reference. With this release (1.27), you can prevent this by setting the
        XML::DOM::Parser options ParseParamEnt => 1 and ExpandParamEnt => 0.

        When it is parsing the contents of the external entities, it *DOES* still add the nodes to
        the DocumentType, but it marks these nodes by setting the 'Hidden' property. In addition, it
        adds an EntityReference node to the DocumentType node.

        When printing the DocumentType node (or when using to_expat() or to_sax()), the 'Hidden'
        nodes are suppressed, so you will see the parameter entity reference instead of the contents
        of the external entities. See test case t/dom_extent.t for an example.

        The reason for adding the 'Hidden' nodes to the DocumentType node, is that the nodes may
        contain <!ENTITY> definitions that are referenced further in the document. (Simply not
        adding the nodes to the DocumentType could cause such entity references to be expanded
        incorrectly.)

        Note that you need XML::Parser 2.27 or higher for this to work correctly.

SEE ALSO
    XML::DOM::XPath

    The Japanese version of this document by Takanori Kawai (Hippo2000) at
    <http://member.nifty.ne.jp/hippo2000/perltips/xml/dom.htm>

    The DOM Level 1 specification at <http://www.w3.org/TR/REC-DOM-Level-1>

    The XML spec (Extensible Markup Language 1.0) at <http://www.w3.org/TR/REC-xml>

    The XML::Parser and XML::Parser::Expat manual pages.

    XML::LibXML also provides a DOM Parser, and is significantly faster than XML::DOM, and is under
    active development. It requires that you download the Gnome libxml library.

    XML::GDOME will provide the DOM Level 2 Core API, and should be as fast as XML::LibXML, but more
    robust, since it uses the memory management functions of libgdome. For more details see
    <http://tjmather.com/xml-gdome/>

CAVEATS
    The method getElementsByTagName() does not return a "live" NodeList. Whether this is an actual
    caveat is debatable, but a few people on the www-dom mailing list seemed to think so. I haven't
    decided yet. It's a pain to implement, it slows things down and the benefits seem marginal. Let
    me know what you think.

AUTHOR
    Enno Derksen is the original author.

    Send patches to T.J. Mather at <tjmather AT maxmind.com>.

    Paid support is available from directly from the maintainers of this package. Please see
    <http://www.maxmind.com/app/opensourceservices> for more details.

    Thanks to Clark Cooper for his help with the initial version.
Generated by phpman local Author: Che Dong Under GNU General Public License
2026-06-15 07:05 @216.73.216.200
CrawledBy Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)