XML::DOM - phpMan

Command: man perldoc info search(apropos)  


XML::DOM(3)           User Contributed Perl Documentation          XML::DOM(3)



NAME
       XML::DOM - A perl module for building DOM Level 1 compliant document structures

SYNOPSIS
        use XML::DOM;

        my $parser = new XML::DOM::Parser;
        my $doc = $parser->parsefile ("file.xml");

        # print all HREF attributes of all CODEBASE elements
        my $nodes = $doc->getElementsByTagName ("CODEBASE");
        my $n = $nodes->getLength;

        for (my $i = 0; $i < $n; $i++)
        {
            my $node = $nodes->item ($i);
            my $href = $node->getAttributeNode ("HREF");
            print $href->getValue . "\n";
        }

        # Print doc file
        $doc->printToFile ("out.xml");

        # Print to string
        print $doc->toString;

        # Avoid memory leaks - cleanup circular references for garbage collection
        $doc->dispose;

DESCRIPTION
       This module extends the XML::Parser module by Clark Cooper.  The XML::Parser module
       is built on top of XML::Parser::Expat, which is a lower level interface to James
       Clark’s expat library.

       XML::DOM::Parser is derived from XML::Parser. It parses XML strings or files and
       builds a data structure that conforms to the API of the Document Object Model as
       described at http://www.w3.org/TR/REC-DOM-Level-1.  See the XML::Parser manpage for
       other available features of the XML::DOM::Parser class.  Note that the ’Style’
       property should not be used (it is set internally.)

       The XML::Parser NoExpand option is more or less supported, in that it will generate
       EntityReference objects whenever an entity reference is encountered in character
       data. I’m not sure how useful this is. Any comments are welcome.

       As described in the synopsis, when you create an XML::DOM::Parser object, the parse
       and parsefile methods create an XML::DOM::Document object from the specified input.
       This Document object can then be examined, modified and written back out to a file
       or converted to a string.

       When using XML::DOM with XML::Parser version 2.19 and up, setting the
       XML::DOM::Parser option KeepCDATA to 1 will store CDATASections in CDATASection
       nodes, instead of converting them to Text nodes.  Subsequent CDATASection nodes
       will be merged into one. Let me know if this is a problem.

       When using XML::Parser 2.27 and above, you can suppress expansion of parameter
       entity references (e.g. %pent;) in the DTD, by setting ParseParamEnt to 1 and
       ExpandParamEnt to 0. See Hidden Nodes for details.

       A Document has a tree structure consisting of Node objects. A Node may contain
       other nodes, depending on its type.  A Document may have Element, Text, Comment,
       and CDATASection nodes.  Element nodes may have Attr, Element, Text, Comment, and
       CDATASection nodes.  The other nodes may not have any child nodes.

       This module adds several node types that are not part of the DOM spec (yet.)  These
       are: ElementDecl (for <!ELEMENT ...> declarations), AttlistDecl (for <!ATTLIST ...>
       declarations), XMLDecl (for <?xml ...?> declarations) and AttDef (for attribute
       definitions in an AttlistDecl.)

XML::DOM Classes
       The XML::DOM module stores XML documents in a tree structure with a root node of
       type XML::DOM::Document. Different nodes in tree represent different parts of the
       XML file. The DOM Level 1 Specification defines the following node types:

       * XML::DOM::Node - Super class of all node types
       * XML::DOM::Document - The root of the XML document
       * XML::DOM::DocumentType - Describes the document structure: <!DOCTYPE root [ ...
       ]>
       * XML::DOM::Element - An XML element: <elem attr="val"> ... </elem>
       * XML::DOM::Attr - An XML element attribute: name="value"
       * XML::DOM::CharacterData - Super class of Text, Comment and CDATASection
       * XML::DOM::Text - Text in an XML element
       * XML::DOM::CDATASection - Escaped block of text: <![CDATA[ text ]]>
       * XML::DOM::Comment - An XML comment: <!-- comment -->
       * XML::DOM::EntityReference - Refers to an ENTITY: &ent; or %ent;
       * XML::DOM::Entity - An ENTITY definition: <!ENTITY ...>
       * XML::DOM::ProcessingInstruction - <?PI target>
       * XML::DOM::DocumentFragment - Lightweight node for cut & paste
       * XML::DOM::Notation - An NOTATION definition: <!NOTATION ...>

       In addition, the XML::DOM module contains the following nodes that are not part of
       the DOM Level 1 Specification:

       * XML::DOM::ElementDecl - Defines an element: <!ELEMENT ...>
       * XML::DOM::AttlistDecl - Defines one or more attributes in an <!ATTLIST ...>
       * XML::DOM::AttDef - Defines one attribute in an <!ATTLIST ...>
       * XML::DOM::XMLDecl - An XML declaration: <?xml version="1.0" ...>

       Other classes that are part of the DOM Level 1 Spec:

       * XML::DOM::Implementation - Provides information about this implementation. Cur-
       rently it doesn’t do much.
       * XML::DOM::NodeList - Used internally to store a node’s child nodes. Also returned
       by getElementsByTagName.
       * XML::DOM::NamedNodeMap - Used internally to store an element’s attributes.

       Other classes that are not part of the DOM Level 1 Spec:

       * XML::DOM::Parser - An non-validating XML parser that creates XML::DOM::Documents
       * XML::DOM::ValParser - A validating XML parser that creates XML::DOM::Documents.
       It uses XML::Checker to check against the DocumentType (DTD)
       * XML::Handler::BuildDOM - A PerlSAX handler that creates XML::DOM::Documents.

XML::DOM package
       Constant definitions
           The following predefined constants indicate which type of node it is.

        UNKNOWN_NODE (0)                The node type is unknown (not part of DOM)

        ELEMENT_NODE (1)                The node is an Element.
        ATTRIBUTE_NODE (2)              The node is an Attr.
        TEXT_NODE (3)                   The node is a Text node.
        CDATA_SECTION_NODE (4)          The node is a CDATASection.
        ENTITY_REFERENCE_NODE (5)       The node is an EntityReference.
        ENTITY_NODE (6)                 The node is an Entity.
        PROCESSING_INSTRUCTION_NODE (7) The node is a ProcessingInstruction.
        COMMENT_NODE (8)                The node is a Comment.
        DOCUMENT_NODE (9)               The node is a Document.
        DOCUMENT_TYPE_NODE (10)         The node is a DocumentType.
        DOCUMENT_FRAGMENT_NODE (11)     The node is a DocumentFragment.
        NOTATION_NODE (12)              The node is a Notation.

        ELEMENT_DECL_NODE (13)          The node is an ElementDecl (not part of DOM)
        ATT_DEF_NODE (14)               The node is an AttDef (not part of DOM)
        XML_DECL_NODE (15)              The node is an XMLDecl (not part of DOM)
        ATTLIST_DECL_NODE (16)          The node is an AttlistDecl (not part of DOM)

        Usage:

          if ($node->getNodeType == ELEMENT_NODE)
          {
              print "It’s an Element";
          }

       Not In DOM Spec: The DOM Spec does not mention UNKNOWN_NODE and, quite frankly, you
       should never encounter it. The last 4 node types were added to support the 4 added
       node classes.

       Global Variables


       $VERSION
           The variable $XML::DOM::VERSION contains the version number of this implementa-
           tion, e.g. "1.07".

       METHODS

       These methods are not part of the DOM Level 1 Specification.

       getIgnoreReadOnly and ignoreReadOnly (readOnly)
           The DOM Level 1 Spec does not allow you to edit certain sections of the docu-
           ment, e.g. the DocumentType, so by default this implementation throws DOMExcep-
           tions (i.e. NO_MODIFICATION_ALLOWED_ERR) when you try to edit a readonly node.
           These readonly checks can be disabled by (temporarily) setting the global
           IgnoreReadOnly flag.

           The ignoreReadOnly method sets the global IgnoreReadOnly flag and returns its
           previous value. The getIgnoreReadOnly method simply returns its current value.

            my $oldIgnore = XML::DOM::ignoreReadOnly (1);
            eval {
            ... do whatever you want, catching any other exceptions ...
            };
            XML::DOM::ignoreReadOnly ($oldIgnore);     # restore previous value

           Another way to do it, using a local variable:

            { # start new scope
               local $XML::DOM::IgnoreReadOnly = 1;
               ... do whatever you want, don’t worry about exceptions ...
            } # end of scope ($IgnoreReadOnly is set back to its previous value)

       isValidName (name)
           Whether the specified name is a valid "Name" as specified in the XML spec.
           Characters with Unicode values > 127 are now also supported.

       getAllowReservedNames and allowReservedNames (boolean)
           The first method returns whether reserved names are allowed.  The second takes
           a boolean argument and sets whether reserved names are allowed.  The initial
           value is 1 (i.e. allow reserved names.)

           The XML spec states that "Names" starting with (X│x)(M│m)(L│l) are reserved for
           future use. (Amusingly enough, the XML version of the XML spec
           (REC-xml-19980210.xml) breaks that very rule by defining an ENTITY with the
           name ’xmlpio’.)  A "Name" in this context means the Name token as found in the
           BNF rules in the XML spec.

           XML::DOM only checks for errors when you modify the DOM tree, not when the DOM
           tree is built by the XML::DOM::Parser.

       setTagCompression (funcref)
           There are 3 possible styles for printing empty Element tags:

           Style 0
                <empty/> or <empty attr="val"/>

               XML::DOM uses this style by default for all Elements.

           Style 1
                 <empty></empty> or <empty attr="val"></empty>

           Style 2
                 <empty /> or <empty attr="val" />

               This style is sometimes desired when using XHTML.  (Note the extra space
               before the slash "/") See <http://www.w3.org/TR/xhtml1> Appendix C for more
               details.

           By default XML::DOM compresses all empty Element tags (style 0.)  You can con-
           trol which style is used for a particular Element by calling XML::DOM::setTag-
           Compression with a reference to a function that takes 2 arguments. The first is
           the tag name of the Element, the second is the XML::DOM::Element that is being
           printed.  The function should return 0, 1 or 2 to indicate which style should
           be used to print the empty tag. E.g.

            XML::DOM::setTagCompression (\&my_tag_compression);

            sub my_tag_compression
            {
               my ($tag, $elem) = @_;

               # Print empty br, hr and img tags like this: <br />
               return 2 if $tag =~ /^(br│hr│img)$/;

               # Print other empty tags like this: <empty></empty>
               return 1;
            }

IMPLEMENTATION DETAILS
       * Perl Mappings
           The value undef was used when the DOM Spec said null.

           The DOM Spec says: Applications must encode DOMString using UTF-16 (defined in
           Appendix C.3 of [UNICODE] and Amendment 1 of [ISO-10646]).  In this implementa-
           tion we use plain old Perl strings encoded in UTF-8 instead of UTF-16.

       * Text and CDATASection nodes
           The Expat parser expands EntityReferences and CDataSection sections to raw
           strings and does not indicate where it was found.  This implementation does
           therefore convert both to Text nodes at parse time.  CDATASection and Enti-
           tyReference nodes that are added to an existing Document (by the user) will be
           preserved.

           Also, subsequent Text nodes are always merged at parse time. Text nodes that
           are added later can be merged with the normalize method. Consider using the
           addText method when adding Text nodes.

       * Printing and toString
           When printing (and converting an XML Document to a string) the strings have to
           encoded differently depending on where they occur. E.g. in a CDATASection all
           substrings are allowed except for "]]>". In regular text, certain characters
           are not allowed, e.g. ">" has to be converted to "&gt;".  These routines should
           be verified by someone who knows the details.

       * Quotes
           Certain sections in XML are quoted, like attribute values in an Element.
           XML::Parser strips these quotes and the print methods in this implementation
           always uses double quotes, so when parsing and printing a document, single
           quotes may be converted to double quotes. The default value of an attribute
           definition (AttDef) in an AttlistDecl, however, will maintain its quotes.

       * AttlistDecl
           Attribute declarations for a certain Element are always merged into a single
           AttlistDecl object.

       * Comments
           Comments in the DOCTYPE section are not kept in the right place. They will
           become child nodes of the Document.

       * Hidden Nodes
           Previous versions of XML::DOM would expand parameter entity references (like
           %pent;), so when printing the DTD, it would print the contents of the external
           entity, instead of the parameter entity reference.  With this release (1.27),
           you can prevent this by setting the XML::DOM::Parser options ParseParamEnt => 1
           and ExpandParamEnt => 0.

           When it is parsing the contents of the external entities, it *DOES* still add
           the nodes to the DocumentType, but it marks these nodes by setting the ’Hidden’
           property. In addition, it adds an EntityReference node to the DocumentType
           node.

           When printing the DocumentType node (or when using to_expat() or to_sax()), the
           ’Hidden’ nodes are suppressed, so you will see the parameter entity reference
           instead of the contents of the external entities. See test case t/dom_extent.t
           for an example.

           The reason for adding the ’Hidden’ nodes to the DocumentType node, is that the
           nodes may contain <!ENTITY> definitions that are referenced further in the doc-
           ument. (Simply not adding the nodes to the DocumentType could cause such entity
           references to be expanded incorrectly.)

           Note that you need XML::Parser 2.27 or higher for this to work correctly.

SEE ALSO
       The Japanese version of this document by Takanori Kawai (Hippo2000) at <http://mem-
       ber.nifty.ne.jp/hippo2000/perltips/xml/dom.htm>

       The DOM Level 1 specification at <http://www.w3.org/TR/REC-DOM-Level-1>

       The XML spec (Extensible Markup Language 1.0) at <http://www.w3.org/TR/REC-xml>

       The XML::Parser and XML::Parser::Expat manual pages.

CAVEATS
       The method getElementsByTagName() does not return a "live" NodeList.  Whether this
       is an actual caveat is debatable, but a few people on the www-dom mailing list
       seemed to think so. I haven’t decided yet. It’s a pain to implement, it slows
       things down and the benefits seem marginal.  Let me know what you think.

       (To subscribe to the www-dom mailing list send an email with the subject "sub-
       scribe" to www-dom-request AT w3.org. I only look here occasionally, so don’t send bug
       reports or suggestions about XML::DOM to this list, send them to enno AT att.com
       instead.)

AUTHOR
       Send bug reports, hints, tips, suggestions to Enno Derksen at <enno AT att.com>.

       Thanks to Clark Cooper for his help with the initial version.



perl v5.8.5                       2000-01-31                       XML::DOM(3)

Generated by $Id: phpMan.php,v 4.55 2007/09/05 04:42:51 chedong Exp $ Author: Che Dong
On Apache/1.3.41 (Unix) PHP/5.2.5 mod_perl/1.30 mod_gzip/1.3.26.1a
Under GNU General Public License
2008-12-02 02:50 @38.103.63.58 CrawledBy CCBot/1.0 (+http://www.commoncrawl.org/bot.html)
Valid XHTML 1.0!Valid CSS!