XML::LibXML::Simple - phpMan

Che Dong
NAME
    XML::LibXML::Simple - XML::LibXML clone of XML::Simple::XMLin()

INHERITANCE
     XML::LibXML::Simple
       is a Exporter

SYNOPSIS
      my $xml  = ...;  # filename, fh, string, or XML::LibXML-node

    Imperative:

      use XML::LibXML::Simple   qw(XMLin);
      my $data = XMLin $xml, %options;

    Or the Object Oriented way:

      use XML::LibXML::Simple   ();
      my $xs   = XML::LibXML::Simple->new(%options);
      my $data = $xs->XMLin($xml, %options);

DESCRIPTION
    This module is a blunt rewrite of XML::Simple (by Grant McLean) to use
    the XML::LibXML parser for XML structures, where the original uses plain
    Perl or SAX parsers.

    Be warned: this module thinks to be smart. You may very well shoot
    yourself in the foot with this DWIMmery. Read the whole manual page at
    least once before you start using it. If your XML is described in a
    schema or WSDL, then use XML::Compile for maintainable code.

METHODS
  Constructors
    XML::LibXML::Simple->new(%options)
        Instantiate an object, which can be used to call XMLin() on. You can
        provide %options to this constructor (to be reused for each call to
        XMLin) and with each call of XMLin (to be used once)

        For descriptions of the %options see the "DETAILS" section of this
        manual page.

  Translators
    $obj->XMLin($xmldata, %options)
        For $xmldata and descriptions of the %options see the "DETAILS"
        section of this manual page.

FUNCTIONS
    The functions "XMLin" (exported implictly) and "xml_in" (exported on
    request) simply call "<XML::LibXML::Simple-"new->XMLin() >> with the
    provided parameters.

DETAILS
  Parameter $xmldata
    As first parameter to XMLin() must provide the XML message to be
    translated into a Perl structure. Choose one of the following:

    A filename
        If the filename contains no directory components, "XMLin()" will
        look for the file in each directory in the SearchPath (see OPTIONS
        below) and in the current directory. eg:

          $data = XMLin('/etc/params.xml', %options);

    A dash (-)
        Parse from STDIN.

          $data = XMLin('-', %options);

    undef
        [deprecated] If there is no XML specifier, "XMLin()" will check the
        script directory and each of the SearchPath directories for a file
        with the same name as the script but with the extension '.xml'.
        Note: if you wish to specify options, you must specify the value
        'undef'. eg:

          $data = XMLin(undef, ForceArray => 1);

        This feature is available for backwards compatibility with
        XML::Simple, but quite sensitive. You can easily hit the wrong xml
        file as input. Please do not use it: always use an explicit
        filename.

    A string of XML
        A string containing XML (recognised by the presence of '<' and '>'
        characters) will be parsed directly. eg:

          $data = XMLin('<opt username="bob" password="flurp" />', %options);

    An IO::Handle object
        In this case, XML::LibXML::Parser will read the XML data directly
        from the provided file.

          # $fh = IO::File->new('/etc/params.xml') or die;
          open my $fh, '<:encoding(utf8)', '/etc/params.xml' or die;

          $data = XMLin($fh, %options);

    An XML::LibXML::Document or ::Element
        [Not available in XML::Simple] When you have a pre-parsed
        XML::LibXML node, you can pass that.

  Parameter %options
    XML::LibXML::Simple supports most options defined by XML::Simple, so the
    interface is quite compatible. Minor changes apply. This explanation is
    extracted from the XML::Simple manual-page.

    *   check out "ForceArray" because you'll almost certainly want to turn
        it on

    *   make sure you know what the "KeyAttr" option does and what its
        default value is because it may surprise you otherwise.

    *   Option names are case in-sensitive so you can use the mixed case
        versions shown here; you can add underscores between the words (eg:
        key_attr) if you like.

    In alphabetic order:

    ContentKey => 'keyname' *# seldom used*
        When text content is parsed to a hash value, this option lets you
        specify a name for the hash key to override the default 'content'.
        So for example:

          XMLin('<opt one="1">Two</opt>', ContentKey => 'text')

        will parse to:

          { one => 1, text => 'Two' }

        instead of:

          { one => 1, content => 'Two' }

        You can also prefix your selected key name with a '-' character to
        have "XMLin()" try a little harder to eliminate unnecessary
        'content' keys after array folding. For example:

          XMLin(
            '<opt><item name="one">First</item><item name="two">Second</item></opt>',
            KeyAttr => {item => 'name'},
            ForceArray => [ 'item' ],
            ContentKey => '-content'
          )

        will parse to:

          {
             item => {
              one =>  'First'
              two =>  'Second'
            }
          }

        rather than this (without the '-'):

          {
            item => {
              one => { content => 'First' }
              two => { content => 'Second' }
            }
          }

    ForceArray => 1 *# important*
        This option should be set to '1' to force nested elements to be
        represented as arrays even when there is only one. Eg, with
        ForceArray enabled, this XML:

            <opt>
              <name>value</name>
            </opt>

        would parse to this:

            { name => [ 'value' ] }

        instead of this (the default):

            { name => 'value' }

        This option is especially useful if the data structure is likely to
        be written back out as XML and the default behaviour of rolling
        single nested elements up into attributes is not desirable.

        If you are using the array folding feature, you should almost
        certainly enable this option. If you do not, single nested elements
        will not be parsed to arrays and therefore will not be candidates
        for folding to a hash. (Given that the default value of 'KeyAttr'
        enables array folding, the default value of this option should
        probably also have been enabled as well).

    ForceArray => [ names ] *# important*
        This alternative (and preferred) form of the 'ForceArray' option
        allows you to specify a list of element names which should always be
        forced into an array representation, rather than the 'all or
        nothing' approach above.

        It is also possible to include compiled regular expressions in the
        list --any element names which match the pattern will be forced to
        arrays. If the list contains only a single regex, then it is not
        necessary to enclose it in an arrayref. Eg:

          ForceArray => qr/_list$/

    ForceContent => 1 *# seldom used*
        When "XMLin()" parses elements which have text content as well as
        attributes, the text content must be represented as a hash value
        rather than a simple scalar. This option allows you to force text
        content to always parse to a hash value even when there are no
        attributes. So for example:

          XMLin('<opt><x>text1</x><y a="2">text2</y></opt>', ForceContent => 1)

        will parse to:

          {
            x => {         content => 'text1' },
            y => { a => 2, content => 'text2' }
          }

        instead of:

          {
            x => 'text1',
            y => { 'a' => 2, 'content' => 'text2' }
          }

    GroupTags => { grouping tag => grouped tag } *# handy*
        You can use this option to eliminate extra levels of indirection in
        your Perl data structure. For example this XML:

          <opt>
           <searchpath>
             <dir>/usr/bin</dir>
             <dir>/usr/local/bin</dir>
             <dir>/usr/X11/bin</dir>
           </searchpath>
         </opt>

        Would normally be read into a structure like this:

          {
            searchpath => {
               dir => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
            }
          }

        But when read in with the appropriate value for 'GroupTags':

          my $opt = XMLin($xml, GroupTags => { searchpath => 'dir' });

        It will return this simpler structure:

          {
            searchpath => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
          }

        The grouping element ("<searchpath>" in the example) must not
        contain any attributes or elements other than the grouped element.

        You can specify multiple 'grouping element' to 'grouped element'
        mappings in the same hashref. If this option is combined with
        "KeyAttr", the array folding will occur first and then the grouped
        element names will be eliminated.

    HookNodes => CODE
        Select document nodes to apply special tricks. Introduced in [0.96],
        not available in XML::Simple.

        When this option is provided, the CODE will be called once the XML
        DOM tree is ready to get transformed into Perl. Your CODE should
        return either "undef" (nothing to do) or a HASH which maps values of
        unique_key (see XML::LibXML::Node method "unique_key" onto CODE
        references to be called.

        Once the translater from XML into Perl reaches a selected node, it
        will call your routine specific for that node. That triggering node
        found is the only parameter. When you return "undef", the node will
        not be found in the final result. You may return any data (even the
        node itself) which will be included in the final result as is, under
        the name of the original node.

        Example:

           my $out = XMLin $file, HookNodes => \&protect_html;

           sub protect_html($$)
           {   # $obj is the instantated XML::Compile::Simple object
               # $xml is a XML::LibXML::Element to get transformed
               my ($obj, $xml) = @_;

               my %hooks;    # collects the table of hooks

               # do an xpath search for HTML
               my $xpc   = XML::LibXML::XPathContext->new($xml);
               my @nodes = $xpc->findNodes(...); #XXX
               @nodes or return undef;

               my $as_text = sub { $_[0]->toString(0) };  # as text
               #  $as_node = sub { $_[0] };               # as node
               #  $skip    = sub { undef };               # not at all

               # the same behavior for all xpath nodes, in this example
               $hook{$_->unique_key} = $as_text
                   for @nodes;

               \%hook;
           }

    KeepRoot => 1 *# handy*
        In its attempt to return a data structure free of superfluous detail
        and unnecessary levels of indirection, "XMLin()" normally discards
        the root element name. Setting the 'KeepRoot' option to '1' will
        cause the root element name to be retained. So after executing this
        code:

          $config = XMLin('<config tempdir="/tmp" />', KeepRoot => 1)

        You'll be able to reference the tempdir as
        "$config->{config}->{tempdir}" instead of the default
        "$config->{tempdir}".

    KeyAttr => [ list ] *# important*
        This option controls the 'array folding' feature which translates
        nested elements from an array to a hash. It also controls the
        'unfolding' of hashes to arrays.

        For example, this XML:

            <opt>
              <user login="grep" fullname="Gary R Epstein" />
              <user login="stty" fullname="Simon T Tyson" />
            </opt>

        would, by default, parse to this:

            {
              user => [
                 { login    => 'grep',
                   fullname => 'Gary R Epstein'
                 },
                 { login    => 'stty',
                   fullname => 'Simon T Tyson'
                 }
              ]
            }

        If the option 'KeyAttr => "login"' were used to specify that the
        'login' attribute is a key, the same XML would parse to:

            {
              user => {
                 stty => { fullname => 'Simon T Tyson' },
                 grep => { fullname => 'Gary R Epstein' }
              }
            }

        The key attribute names should be supplied in an arrayref if there
        is more than one. "XMLin()" will attempt to match attribute names in
        the order supplied.

        Note 1: The default value for 'KeyAttr' is "['name', 'key', 'id']".
        If you do not want folding on input or unfolding on output you must
        setting this option to an empty list to disable the feature.

        Note 2: If you wish to use this option, you should also enable the
        "ForceArray" option. Without 'ForceArray', a single nested element
        will be rolled up into a scalar rather than an array and therefore
        will not be folded (since only arrays get folded).

    KeyAttr => { list } *# important*
        This alternative (and preferred) method of specifying the key
        attributes allows more fine grained control over which elements are
        folded and on which attributes. For example the option 'KeyAttr => {
        package => 'id' } will cause any package elements to be folded on
        the 'id' attribute. No other elements which have an 'id' attribute
        will be folded at all.

        Two further variations are made possible by prefixing a '+' or a '-'
        character to the attribute name:

        The option 'KeyAttr => { user => "+login" }' will cause this XML:

            <opt>
              <user login="grep" fullname="Gary R Epstein" />
              <user login="stty" fullname="Simon T Tyson" />
            </opt>

        to parse to this data structure:

            {
              user => {
                 stty => {
                    fullname => 'Simon T Tyson',
                    login    => 'stty'
                 },
                 grep => {
                    fullname => 'Gary R Epstein',
                    login    => 'grep'
                 }
              }
            }

        The '+' indicates that the value of the key attribute should be
        copied rather than moved to the folded hash key.

        A '-' prefix would produce this result:

            {
              user => {
                 stty => {
                    fullname => 'Simon T Tyson',
                    -login   => 'stty'
                 },
                 grep => {
                    fullname => 'Gary R Epstein',
                    -login    => 'grep'
                 }
              }
            }

    NoAttr => 1 *# handy*
        When used with "XMLin()", any attributes in the XML will be ignored.

    NormaliseSpace => 0 | 1 | 2 *# handy*
        This option controls how whitespace in text content is handled.
        Recognised values for the option are:

        "0" (default) whitespace is passed through unaltered (except of
            course for the normalisation of whitespace in attribute values
            which is mandated by the XML recommendation)

        "1" whitespace is normalised in any value used as a hash key
            (normalising means removing leading and trailing whitespace and
            collapsing sequences of whitespace characters to a single space)

        "2" whitespace is normalised in all text content

        Note: you can spell this option with a 'z' if that is more natural
        for you.

    Parser => OBJECT
        You may pass your own XML::LibXML object, in stead of having one
        created for you. This is useful when you need specific configuration
        on that object (See XML::LibXML::Parser) or have implemented your
        own extension to that object.

        The internally created parser object is configured in safe mode.
        Read the XML::LibXML::Parser manual about security issues with
        certain parameter settings. The default is unsafe!

    ParserOpts => HASH|ARRAY
        Pass parameters to the creation of a new internal parser object. You
        can overrule the options which will create a safe parser. It may be
        more readible to use the "Parser" parameter.

    SearchPath => [ list ] *# handy*
        If you pass "XMLin()" a filename, but the filename include no
        directory component, you can use this option to specify which
        directories should be searched to locate the file. You might use
        this option to search first in the user's home directory, then in a
        global directory such as /etc.

        If a filename is provided to "XMLin()" but SearchPath is not
        defined, the file is assumed to be in the current directory.

        If the first parameter to "XMLin()" is undefined, the default
        SearchPath will contain only the directory in which the script
        itself is located. Otherwise the default SearchPath will be empty.

    SuppressEmpty => 1 | '' | undef
        [0.99] What to do with empty elements (no attributes and no
        content). The default behaviour is to represent them as empty
        hashes. Setting this option to a true value (eg: 1) will cause empty
        elements to be skipped altogether. Setting the option to 'undef' or
        the empty string will cause empty elements to be represented as the
        undefined value or the empty string respectively.

    ValueAttr => [ names ] *# handy*
        Use this option to deal elements which always have a single
        attribute and no content. Eg:

          <opt>
            <colour value="red" />
            <size   value="XXL" />
          </opt>

        Setting "ValueAttr => [ 'value' ]" will cause the above XML to parse
        to:

          {
            colour => 'red',
            size   => 'XXL'
          }

        instead of this (the default):

          {
            colour => { value => 'red' },
            size   => { value => 'XXL' }
          }

    NsExpand => 0 *advised*
        When name-spaces are used, the default behavior is to include the
        prefix in the key name. However, this is very dangerous: the
        prefixes can be changed without a change of the XML message meaning.
        Therefore, you can better use this "NsExpand" option. The downside,
        however, is that the labels get very long.

        Without this option:

          <record xmlns:x="http://xyz">
            <x:field1>42</x:field1>
          </record>
          <record xmlns:y="http://xyz">
            <y:field1>42</y:field1>
          </record>

        translates into

          { 'x:field1' => 42 }
          { 'y:field1' => 42 }

        but both source component have exactly the same meaning. When
        "NsExpand" is used, the result is:

          { '{http://xyz}field1' => 42 }
          { '{http://xyz}field1' => 42 }

        Of course, addressing these fields is more work. It is advised to
        implement it like this:

          my $ns = 'http://xyz';
          $data->{"{$ns}field1"};

    NsStrip => 0 *sloppy coding*
        [not available in XML::Simple] Namespaces are really important to
        avoid name collissions, but they are a bit of a hassle. To do it
        correctly, use option "NsExpand". To do it sloppy, use "NsStrip".
        With this option set, the above example will return

          { field1 => 42 }
          { field1 => 42 }

EXAMPLES
    When "XMLin()" reads the following very simple piece of XML:

        <opt username="testuser" password="frodo"></opt>

    it returns the following data structure:

        {
          username => 'testuser',
          password => 'frodo'
        }

    The identical result could have been produced with this alternative XML:

        <opt username="testuser" password="frodo" />

    Or this (although see 'ForceArray' option for variations):

        <opt>
          <username>testuser</username>
          <password>frodo</password>
        </opt>

    Repeated nested elements are represented as anonymous arrays:

        <opt>
          <person firstname="Joe" lastname="Smith">
            <email>joe AT smith.com</email>
            <email>jsmith AT yahoo.com</email>
          </person>
          <person firstname="Bob" lastname="Smith">
            <email>bob AT smith.com</email>
          </person>
        </opt>

        {
          person => [
            { email     => [ 'joe AT smith.com', 'jsmith AT yahoo.com' ],
              firstname => 'Joe',
              lastname  => 'Smith'
            },
            { email     => 'bob AT smith.com',
              firstname => 'Bob',
              lastname  => 'Smith'
            }
          ]
        }

    Nested elements with a recognised key attribute are transformed (folded)
    from an array into a hash keyed on the value of that attribute (see the
    "KeyAttr" option):

        <opt>
          <person key="jsmith" firstname="Joe" lastname="Smith" />
          <person key="tsmith" firstname="Tom" lastname="Smith" />
          <person key="jbloggs" firstname="Joe" lastname="Bloggs" />
        </opt>

        {
          person => {
             jbloggs => {
                firstname => 'Joe',
                lastname  => 'Bloggs'
             },
             tsmith  => {
                firstname => 'Tom',
                lastname  => 'Smith'
             },
             jsmith => {
                firstname => 'Joe',
                lastname => 'Smith'
             }
          }
        }

    The <anon> tag can be used to form anonymous arrays:

        <opt>
          <head><anon>Col 1</anon><anon>Col 2</anon><anon>Col 3</anon></head>
          <data><anon>R1C1</anon><anon>R1C2</anon><anon>R1C3</anon></data>
          <data><anon>R2C1</anon><anon>R2C2</anon><anon>R2C3</anon></data>
          <data><anon>R3C1</anon><anon>R3C2</anon><anon>R3C3</anon></data>
        </opt>

        {
          head => [ [ 'Col 1', 'Col 2', 'Col 3' ] ],
          data => [ [ 'R1C1', 'R1C2', 'R1C3' ],
                    [ 'R2C1', 'R2C2', 'R2C3' ],
                    [ 'R3C1', 'R3C2', 'R3C3' ]
                  ]
        }

    Anonymous arrays can be nested to arbirtrary levels and as a special
    case, if the surrounding tags for an XML document contain only an
    anonymous array the arrayref will be returned directly rather than the
    usual hashref:

        <opt>
          <anon><anon>Col 1</anon><anon>Col 2</anon></anon>
          <anon><anon>R1C1</anon><anon>R1C2</anon></anon>
          <anon><anon>R2C1</anon><anon>R2C2</anon></anon>
        </opt>

        [
          [ 'Col 1', 'Col 2' ],
          [ 'R1C1', 'R1C2' ],
          [ 'R2C1', 'R2C2' ]
        ]

    Elements which only contain text content will simply be represented as a
    scalar. Where an element has both attributes and text content, the
    element will be represented as a hashref with the text content in the
    'content' key (see the "ContentKey" option):

      <opt>
        <one>first</one>
        <two attr="value">second</two>
      </opt>

      {
        one => 'first',
        two => { attr => 'value', content => 'second' }
      }

    Mixed content (elements which contain both text content and nested
    elements) will be not be represented in a useful way - element order and
    significant whitespace will be lost. If you need to work with mixed
    content, then XML::Simple is not the right tool for your job - check out
    the next section.

  Differences to XML::Simple
    In general, the output and the options are equivalent, although this
    module has some differences with XML::Simple to be aware of.

    only XMLin() is supported
        If you want to write XML then use a schema (for instance with
        XML::Compile). Do not attempt to create XML by hand! If you still
        think you need it, then have a look at XMLout() as implemented by
        XML::Simple or any of a zillion template systems.

    no "variables" option
        IMO, you should use a templating system if you want variables
        filled-in in the input: it is not a task for this module.

    ForceArray options
        There are a few small differences in the result of the "forcearray"
        option, because XML::Simple seems to behave inconsequently.

    hooks
        XML::Simple does not support hooks.

SEE ALSO
    XML::Compile for processing XML when a schema is available. When you
    have a schema, the data and structure of your message get validated.

    XML::Simple, the original implementation which interface is followed as
    closely as possible.

COPYRIGHTS
    The interface design and large parts of the documentation were taken
    from the XML::Simple module, written by Grant McLean <grantm AT cpan.org>

    Copyrights of the perl code and the related documentation by 2008-2020
    by [Mark Overmeer <markov AT cpan.org>]. For other contributors see
    ChangeLog.

    This program is free software; you can redistribute it and/or modify it
    under the same terms as Perl itself. See http://dev.perl.org/licenses/