{
    "mode": "perldoc",
    "parameter": "HTML::Tree::AboutTrees",
    "section": "",
    "url": "https://www.chedong.com/phpMan.php/perldoc/HTML%3A%3ATree%3A%3AAboutTrees/json",
    "generated": "2026-06-11T13:10:01Z",
    "synopsis": "# This an article, not a module.",
    "sections": {
        "NAME": {
            "content": "HTML::Tree::AboutTrees -- article on tree-shaped data structures in Perl\n",
            "subsections": []
        },
        "SYNOPSIS": {
            "content": "# This an article, not a module.\n",
            "subsections": []
        },
        "DESCRIPTION": {
            "content": "The following article by Sean M. Burke first appeared in *The Perl Journal* #18 and is copyright\n2000 The Perl Journal. It appears courtesy of Jon Orwant and The Perl Journal. This document may\nbe distributed under the same terms as Perl itself.\n",
            "subsections": []
        },
        "Trees": {
            "content": "-- Sean M. Burke\n\n\"AaaAAAaauugh! Watch out for that tree!\" -- *George of the Jungle theme*\n\nPerl's facility with references, combined with its automatic management of memory allocation,\nmakes it straightforward to write programs that store data in structures of arbitrary form and\ncomplexity.\n\nBut I've noticed that many programmers, especially those who started out with more restrictive\nlanguages, seem at home with complex but uniform data structures -- N-dimensional arrays, or\nmore struct-like things like hashes-of-arrays(-of-hashes(-of-hashes), etc.) -- but they're often\nuneasy with building more freeform, less tabular structures, like tree-shaped data structures.\n\nBut trees are easy to build and manage in Perl, as I'll demonstrate by showing off how the\nHTML::Element class manages elements in an HTML document tree, and by walking you through a\nfrom-scratch implementation of game trees. But first we need to nail down what we mean by a\n\"tree\".\n\nSocratic Dialogues: \"What is a Tree?\"\nMy first brush with tree-shaped structures was in linguistics classes, where tree diagrams are\nused to describe the syntax underlying natural language sentences. After learning my way around\n*those* trees, I started to wonder -- are what I'm used to calling \"trees\" the same as what\nprogrammers call \"trees\"? So I asked lots of helpful and patient programmers how they would\ndefine a tree. Many replied with a answer in jargon that they could not really explain\n(understandable, since explaining things, especially defining things, is harder than people\nthink):\n\n-- So what *is* a \"tree\", a tree-shaped data structure?\n\n-- A tree is a special case of an acyclic directed graph!\n\n-- What's a \"graph\"?\n\n-- Um... lines... and... you draw it... with... arcs! nodes! um...\n\nThe most helpful were folks who couldn't explain directly, but with whom I could get into a\nrather Socratic dialog (where *I* asked the half-dim half-earnest questions), often with much\ndoodling of illustrations...\n\nQuestion: so what's a tree?\n\nAnswer: A tree is a collection of nodes that are linked together in a, well, tree-like way! Like\nthis *[drawing on a napkin]:*\n\nA\n/ \\\nB   C\n/ | \\\nD  E  F\n\nQ: So what do these letters represent?\n\nA: Each is a different node, a bunch of data. Maybe C is a bunch of data that stores a number,\nmaybe a hash table, maybe nothing at all besides the fact that it links to D, E, and F (which\nare other nodes).\n\nQ: So what're the lines between the nodes?\n\nA: Links. Also called \"arcs\". They just symbolize the fact that each node holds a list of nodes\nit links to.\n\nQ: So what if I draw nodes and links, like this...\n\nB -- E\n/ \\  / \\\nA   C\n\\ /\nE\n\nIs that still a tree?\n\nA: No, not at all. There's a lot of un-treelike things about that. First off, E has a link\ncoming off of it going into nowhere. You can't have a link to nothing -- you can only link to\nanother node. Second off, I don't know what that sideways link between B and E means...\n\nQ: Okay, let's work our way up from something simpler. Is this a tree...?\n\nA\n\nA: Yes, I suppose. It's a tree of just one node.\n\nQ: And how about...\n\nA\n\nB\n\nA: No, you can't just have nodes floating there, unattached.\n\nQ: Okay, I'll link A and B. How's this?\n\nA\n|\nB\n\nA: Yup, that's a tree. There's a node A, and a node B, and they're linked.\n\nQ: How is that tree any different from this one...?\n\nB\n|\nA\n\nA: Well, in both cases A and B are linked. But it's in a different direction.\n\nQ: Direction? What does the direction mean?\n\nA: Well, it depends what the tree represents. If it represents a categorization, like this:\n\ncitrus\n/    |    \\\norange  lemon  kumquat ...\n\nthen you mean to say that oranges, lemons, kumquats, etc., are a kind of citrus. But if you drew\nit upside down, you'd be saying, falsely, that citrus is a kind of kumquat, a kind of lemon, and\na kind of orange. If the tree represented cause-and-effect (or at least what situations could\nfollow others), or represented what's a part of what, you wouldn't want to get those backwards,\neither. So with the nodes you draw together on paper, one has to be over the other, so you can\ntell which way the relationship in the tree works.\n\nQ: So are these two trees the same?\n\nA          A\n/ \\        / \\\nB   C      B   \\\nC\n\nA: Yes, although by convention we often try to line up things in the same generation, like it is\nin the diagram on the left.\n\nQ: \"generation\"? This is a family tree?\n\nA: No, not unless it's a family tree for just yeast cells or something else that reproduces\nasexually. But for sake of having lots of terms to use, we just pretend that links in the tree\nrepresent the \"is a child of\" relationship, instead of \"is a kind of\" or \"is a part of\", or\n\"could result from\", or whatever the real relationship is. So we get to borrow a lot of kinship\nwords for describing trees -- B and C are \"children\" (or \"daughters\") of A; A is the \"parent\"\n(or \"mother\") of B and C. Node C is a \"sibling\" (or \"sister\") of node C; and so on, with terms\nlike \"descendants\" (a node's children, children's children, etc.), and \"generation\" (all the\nnodes at the same \"level\" in the tree, i.e., are either all grandchildren of the top node, or\nall great-grand-children, etc.), and \"lineage\" or \"ancestors\" (parents, and parent's parents,\netc., all the way to the topmost node).\n\nSo then we get to express rules in terms like \"A node cannot have more than one parent\", which\nmeans that this is not a valid tree:\n\nA\n/ \\\nB   C\n\\ /\nE\n\nAnd: \"A node can't be its own parent\", which excludes this looped-up connection:\n\n/\\\nA  |\n\\/\n\nOr, put more generally: \"A node can't be its own ancestor\", which excludes the above loop, as\nwell as the one here:\n\n/\\\nZ  |\n/   |\nA    |\n/ \\   |\nB   C  |\n\\/\n\nThat tree is excluded because A is a child of Z, and Z is a child of C, and C is a child of A,\nwhich means A is its own great-grandparent. So this whole network can't be a tree, because it\nbreaks the sort of meta-rule: once any node in the supposed tree breaks the rules for trees, you\ndon't have a tree anymore.\n\nQ: Okay, now, are these two trees the same?\n\nA         A\n/ | \\     / | \\\nB  C  D   D  C  B\n\nA: It depends whether you're basing your concept of trees on each node having a set (unordered\nlist) of children, or an (ordered) list of children. It's a question of whether ordering is\nimportant for what you're doing. With my diagram of citrus types, ordering isn't important, so\nthese tree diagrams express the same thing:\n\ncitrus\n/    |    \\\norange  lemon  kumquat\n\ncitrus\n/     |    \\\nkumquat  orange  lemon\n\nbecause it doesn't make sense to say that oranges are \"before\" or \"after\" kumquats in the whole\nbotanical scheme of things. (Unless, of course, you *are* using ordering to mean something, like\na degree of genetic similarity.)\n\nBut consider a tree that's a diagram of what steps are comprised in an activity, to some degree\nof specificity:\n\nmake tea\n/    |     \\\npour     infuse   serve\nhot water    / \\\nin cup/pot  /     \\\nadd     let\ntea     sit\nleaves\n\nThis means that making tea consists of putting hot water in a cup or put, infusing it (which\nitself consists of adding tea leaves and letting it sit), then serving it -- *in that order*. If\nyou serve an empty dry pot (sipping from empty cups, etc.), let it sit, add tea leaves, and pour\nin hot water, then what you're doing is performance art, not tea preparation:\n\nperformance\nart\n/    |     \\\nserve   infuse    pour\n/ \\       hot water\n/     \\      in cup/pot\nlet     add\nsit     tea\nleaves\n\nExcept for my having renamed the root, this tree is the same as the making-tea tree as far as\nwhat's under what, but it differs in order, and what the tree means makes the order important.\n\nQ: Wait -- \"root\"? What's a root?\n\nA: Besides kinship terms like \"mother\" and \"daughter\", the jargon for tree parts also has terms\nfrom real-life tree parts: the part that everything else grows from is called the root; and\nnodes that don't have nodes attached to them (i.e., childless nodes) are called \"leaves\".\n\nQ: But you've been drawing all your trees with the root at the top and leaves at the bottom.\n\nA: Yes, but for some reason, that's the way everyone seems to think of trees. They can draw\ntrees as above; or they can draw them sort of sideways with indenting representing what nodes\nare children of what:\n\n* make tea\n* pour hot water in cup/pot\n* infuse\n* add tea leaves\n* let sit\n* serve\n\n...but folks almost never seem to draw trees with the root at the bottom. So imagine it's based\non spider plant in a hanging pot. Unfortunately, spider plants *aren't* botanically trees,\nthey're plants; but \"spider plant diagram\" is rather a mouthful, so let's just call them trees.\n",
            "subsections": [
                {
                    "name": "Trees Defined Formally",
                    "content": "In time, I digested all these assorted facts about programmers' ideas of trees (which turned out\nto be just a more general case of linguistic ideas of trees) into a single rule:\n\n* A node is an item that contains (\"is over\", \"is parent of\", etc.) zero or more other nodes.\n\nFrom this you can build up formal definitions for useful terms, like so:\n\n* A node's descendants are defined as all its children, and all their children, and so on. Or,\nstated recursively: a node's descendants are all its children, and all its children's\ndescendants. (And if it has no children, it has no descendants.)\n\n* A node's ancestors consist of its parent, and its parent's parent, etc, up to the root. Or,\nrecursively: a node's ancestors consist of its parent and its parent's ancestors. (If it has no\nparent, it has no ancestors.)\n\n* A tree is a root node and all the root's descendants.\n\nAnd you can add a proviso or two to clarify exactly what I impute to the word \"other\" in \"other\nnodes\":\n\n* A node cannot contain itself, or contain any node that contains it, etc. Looking at it the\nother way: a node cannot be its own parent or ancestor.\n\n* A node can be root (i.e., no other node contains it) or can be contained by only one parent;\nno node can be the child of two or more parents.\n\nAdd to this the idea that children are sometimes ordered, and sometimes not, and that's about\nall you need to know about defining what a tree is. From there it's a matter of using them.\n"
                },
                {
                    "name": "Markup Language Trees: HTML-Tree",
                    "content": "While not *all* markup languages are inherently tree-like, the best-known family of markup\nlanguages, HTML, SGML, and XML, are about as tree-like as you can get. In these languages, a\ndocument consists of elements and character data in a tree structure where there is one root\nelement, and elements can contain either other elements, or character data.\n\nFootnote: For sake of simplicity, I'm glossing over comments (<!-- ... -->), processing\ninstructions (<?xml version='1.0'>), and declarations (<!ELEMENT ...>, <!DOCTYPE ...>). And\nI'm not bothering to distinguish entity references (&lt;, &#64;) or CDATA sections\n(<![CDATA[ ...]]>) from normal text.\n\nFor example, consider this HTML document:\n\n<html lang=\"en-US\">\n<head>\n<title>\nBlank Document!\n</title>\n</head>\n<body bgcolor=\"#d010ff\">\nI've got\n<em>\nsomething to saaaaay\n</em>\n!\n</body>\n</html>\n\nI've indented this to point out what nodes (elements or text items) are children of what, with\neach node on a line of its own.\n\nThe HTML::TreeBuilder module (in the CPAN distribution HTML-Tree) does the work of taking HTML\nsource and building in memory the tree that the document source represents.\n\nFootnote: it requires the HTML::Parser module, which tokenizes the source -- i.e.,\nidentifies each tag, bit of text, comment, etc.\n\nThe trees structures that it builds represent bits of text with normal Perl scalar string\nvalues; but elements are represented with objects -- that is, chunks of data that belong to a\nclass (in this case, HTML::Element), a class that provides methods (routines) for accessing the\npieces of data in each element, and otherwise doing things with elements. (See my article in\nTPJ#17 for a quick explanation of objects, the POD document \"perltoot\" for a longer explanation,\nor Damian Conway's excellent book *Object-Oriented Perl* for the full story.)\n\nEach HTML::Element object contains a number of pieces of data:\n\n* its element name (\"html\", \"h1\", etc., accessed as $element->tag)\n\n* a list of elements (or text segments) that it contains, if any (accessed as\n$element->contentlist or $element->content, depending on whether you want a list, or an\narrayref)\n\n* what element, if any, contains it (accessed as $element->parent)\n\n* and any SGML attributes that the element has, such as \"lang=\"en-US\"\", \"align=\"center\"\", etc.\n(accessed as $element->attr('lang'), $element->attr('center'), etc.)\n\nSo, for example, when HTML::TreeBuilder builds the tree for the above HTML document source, the\nobject for the \"body\" element has these pieces of data:\n\n* element name: \"body\"\n* nodes it contains:\nthe string \"I've got \"\nthe object for the \"em\" element\nthe string \"!\"\n* its parent:\nthe object for the \"html\" element\n* bgcolor: \"#d010ff\"\n\nNow, once you have this tree of objects, almost anything you'd want to do with it starts with\nsearching the tree for some bit of information in some element.\n\nAccessing a piece of information in, say, a hash of hashes of hashes, is straightforward:\n\n$password{'sean'}{'sburke1'}{'hpux'}\n\nbecause you know that all data points in that structure are accessible with that syntax, but\nwith just different keys. Now, the \"em\" element in the above HTML tree does happen to be\naccessible as the root's child #1's child #1:\n\n$root->content->[1]->content->[1]\n\nBut with trees, you typically don't know the exact location (via indexes) of the data you're\nlooking for. Instead, finding what you want will typically involve searching through the tree,\nseeing if every node is the kind you want. Searching the whole tree is simple enough -- look at\na given node, and if it's not what you want, look at its children, and so on. HTML-Tree provides\nseveral methods that do this for you, such as \"findbytagname\", which returns the elements (or\nthe first element, if called in scalar context) under a given node (typically the root) whose\ntag name is whatever you specify.\n\nFor example, that \"em\" node can be found as:\n\nmy $thatem = $root->findbytagname('em');\n\nor as:\n\n@ems = $root->findbytagname('em');\n# will only have one element for this particular tree\n\nNow, given an HTML document of whatever structure and complexity, if you wanted to do something\nlike change every\n\n<em>*stuff*</em>\n\nto\n\n<em class=\"funky\"> <b>[-</b> *stuff* <b>-]</b> </em>\n\nthe first step is to frame this operation in terms of what you're doing to the tree. You're\nchanging this:\n\nem\n|\n...\n\nto this:\n\nem\n/  |  \\\nb  ...   b\n|        |\n\"[-\"     \"-]\"\n\nIn other words, you're finding all elements whose tag name is \"em\", setting its class attribute\nto \"funky\", and adding one child to the start of its content list -- a new \"b\" element whose\ncontent is the text string \"[-\" -- and one to the end of its content list -- a new \"b\" element\nwhose content is the text string \"-]\".\n\nOnce you've got it in these terms, it's just a matter of running to the HTML::Element\ndocumentation, and coding this up with calls to the appropriate methods, like so:\n\nuse HTML::Element 1.53;\nuse HTML::TreeBuilder 2.96;\n# Build the tree by parsing the document\nmy $root = HTML::TreeBuilder->new;\n$root->parsefile('whatever.html'); # source file\n\n# Now make new nodes where needed\nforeach my $em ($root->findbytagname('em')) {\n$em->attr('class', 'funky'); # Set that attribute\n\n# Make the two new B nodes\nmy $new1 = HTML::Element->new('b');\nmy $new2 = HTML::Element->new('b');\n# Give them content (they have none at first)\n$new1->pushcontent('[-');\n$new2->pushcontent('-]');\n\n# And put 'em in place!\n$em->unshiftcontent($new1);\n$em->pushcontent($new2);\n}\nprint\n\"<!-- Looky see what I did! -->\\n\",\n$root->asHTML(), \"\\n\";\n\nThe class HTML::Element provides just about every method I can image you needing, for\nmanipulating trees made of HTML::Element objects. (And what it doesn't directly provide, it will\ngive you the components to build it with.)\n"
                },
                {
                    "name": "Building Your Own Trees",
                    "content": "Theoretically, any tree is pretty much like any other tree, so you could use HTML::Element for\nanything you'd ever want to do with tree-arranged objects. However, as its name implies,\nHTML::Element is basically *for* HTML elements; it has lots of features that make sense only for\nHTML elements (like the idea that every element must have a tag-name). And it lacks some\nfeatures that might be useful for general applications -- such as any sort of checking to make\nsure that you're not trying to arrange objects in a non-treelike way. For a general-purpose tree\nclass that does have such features, you can use Tree::DAGNode, also available from CPAN.\n\nHowever, if your task is simple enough, you might find it overkill to bother using\nTree::DAGNode. And, in any case, I find that the best way to learn how something works is to\nimplement it (or something like it, but simpler) yourself. So I'll here discuss how you'd\nimplement a tree structure, *without* using any of the existing classes for tree nodes.\n"
                },
                {
                    "name": "Implementation: Game Trees for Alak",
                    "content": "Suppose that the task at hand is to write a program that can play against a human opponent at a\nstrategic board game (as opposed to a board game where there's an element of chance). For most\nsuch games, a \"game tree\" is an essential part of the program (as I will argue, below), and this\nwill be our test case for implementing a tree structure from scratch.\n\nFor sake of simplicity, our game is not chess or backgammon, but instead a much simpler game\ncalled Alak. Alak was invented by the mathematician A. K. Dewdney, and described in his 1984\nbook *Planiverse*. The rules of Alak are simple:\n\nFootnote: Actually, I'm describing only my interpretation of the rules Dewdney describes in\n*Planiverse*. Many other interpretations are possible.\n\n* Alak is a two-player game played on a one-dimensional board with eleven slots on it. Each slot\ncan hold at most one piece at a time. There's two kinds of pieces, which I represent here as \"x\"\nand \"o\" -- x's belong to one player (called X), o's to the other (called O).\n\n* The initial configuration of the board is:\n\nxxxxoooo\n\nFor sake of the article, the slots are numbered from 1 (on the left) to 11 (on the right), and X\nalways has the first move.\n\n* The players take turns moving. At each turn, each player can move only one piece, once. (This\nunlike checkers, where you move one piece per move but get to keep moving it if you jump an your\nopponent's piece.) A player cannot pass up on his turn. A player can move any one of his pieces\nto the next unoccupied slot to its right or left, which may involve jumping over occupied slots.\nA player cannot move a piece off the side of the board.\n\n* If a move creates a pattern where the opponent's pieces are surrounded, on both sides, by two\npieces of the mover's color (with no intervening unoccupied blank slot), then those surrounded\npieces are removed from the board.\n\n* The goal of the game is to remove all of your opponent's pieces, at which point the game ends.\nRemoving all-but-one ends the game as well, since the opponent can't surround you with one\npiece, and so will always lose within a few moves anyway.\n\nConsider, then, this rather short game where X starts:\n\nxxxxoooo\n^         Move 1: X moves from 3 (shown with caret) to 5\n(Note that any of X's pieces could move, but\nthat the only place they could move to is 5.)\nxxxxoooo\n^   Move 2: O moves from 9 to 7.\nxxxxoooo\n^        Move 3: X moves from 4 to 6.\nxxxxoooo\n^  Move 4: O (stupidly) moves from 10 to 9.\nxxxxoooo\n^       Move 5: X moves from 5 to 10, making the board\n\"xxxoooxo\".  The three o's that X just\nsurrounded are removed.\nxxxxo\nO has only one piece, so has lost.\n\nNow, move 4 could have gone quite the other way:\n\nxxxxoooo\nMove 4: O moves from 8 to 4, making the board\n\"xxoxxooo\".  The surrounded x's are removed.\nxxoooo\n^           Move 5: X moves from 1 to 2.\nxxoooo\n^     Move 6: O moves from 7 to 6.\nxxoooo\n^          Move 7: X moves from 2 to 5, removing the o at 4.\nxxooo\n...and so on.\n\nTo teach a computer program to play Alak (as player X, say), it needs to be able to look at the\nconfiguration of the board, figure out what moves it can make, and weigh the benefit or costs,\nimmediate or eventual, of those moves.\n\nSo consider the board from just before move 3, and figure all the possible moves X could make. X\nhas pieces in slots 1, 2, 4, and 5. The leftmost two x's (at 1 and 2) are up against the end of\nthe board, so they can move only right. The other two x's (at 4 and 5) can move either right or\nleft:\n\nStarting board: xxxxoooo\nmoving 1 to 3 gives xxxxoooo\nmoving 2 to 3 gives xxxxoooo\nmoving 4 to 3 gives xxxxoooo\nmoving 5 to 3 gives xxxxoooo\nmoving 4 to 6 gives xxxxoooo\nmoving 5 to 6 gives xxxxoooo\n\nFor the computer to decide which of these is the best move to make, it needs to quantify the\nbenefit of these moves as a number -- call that the \"payoff\". The payoff of a move can be\nfigured as just the number of x pieces removed by the most recent move, minus the number of o\npieces removed by the most recent move. (It so happens that the rules of the game mean that no\nmove can delete both o's and x's, but the formula still applies.) Since none of these moves\nremoved any pieces, all these moves have the same immediate payoff: 0.\n\nNow, we could race ahead and write an Alak-playing program that could use the immediate payoff\nto decide which is the best move to make. And when there's more than one best move (as here,\nwhere all the moves are equally good), it could choose randomly between the good alternatives.\nThis strategy is simple to implement; but it makes for a very dumb program. Consider what O's\nresponse to each of the potential moves (above) could be. Nothing immediately suggests itself\nfor the first four possibilities (X having moved something to position 3), but either of the\nlast two (illustrated below) are pretty perilous, because in either case O has the obvious\noption (which he would be foolish to pass up) of removing x's from the board:\n\nxxxxoooo\n^        X moves 4 to 6.\nxxxxoooo\n^    O moves 8 to 4, giving \"xxoxxooo\".  The two\nsurrounded x's are removed.\nxxoooo\n\nor\n\nxxxxoooo\n^       X moves 5 to 6.\nxxxxoooo\n^    O moves 8 to 5, giving \"xxxoxooo\".  The one\nsurrounded x is removed.\nxxxoooo\n\nBoth contingencies are quite bad for X -- but this is not captured by the fact that they start\nout with X thinking his move will be harmless, having a payoff of zero.\n\nSo what's needed is for X to think *more* than one step ahead -- to consider not merely what it\ncan do in this move, and what the payoff is, but to consider what O might do in response, and\nthe payoff of those potential moves, and so on with X's possible responses to those cases could\nbe. All these possibilities form a game tree -- a tree where each node is a board, and its\nchildren are successors of that node -- i.e., the boards that could result from every move\npossible, given the parent's board.\n\nBut how to represent the tree, and how to represent the nodes?\n\nWell, consider that a node holds several pieces of data:\n\n1) the configuration of the board, which, being nice and simple and one-dimensional, can be\nstored as just a string, like \"xxxxoooo\".\n\n2) whose turn it is, X or O. (Or: who moved last, from which we can figure whose turn it is).\n\n3) the successors (child nodes).\n\n4) the immediate payoff of having moved to this board position from its predecessor (parent\nnode).\n\n5) and what move gets us from our predecessor node to here. (Granted, knowing the board\nconfiguration before and after the move, it's easy to figure out the move; but it's easier still\nto store it as one is figuring out a node's successors.)\n\n6) whatever else we might want to add later.\n\nThese could be stored equally well in an array or in a hash, but it's my experience that hashes\nare best for cases where you have more than just two or three bits of data, or especially when\nyou might need to add new bits of data. Moreover, hash key names are mnemonic --\n$node->{'lastmovepayoff'} is plain as day, whereas it's not so easy having to remember with an\narray that $node->[3] is where you decided to keep the payoff.\n\nFootnote: Of course, there are ways around that problem: just swear you'll never use a real\nnumeric index to access data in the array, and instead use constants with mnemonic names:\n\nuse strict;\nuse constant idxPAYOFF => 3;\n...\n$n->[idxPAYOFF]\n\nOr use a pseudohash. But I prefer to keep it simple, and use a hash.\n\nThese are, incidentally, the same arguments that people weigh when trying to decide whether\ntheir object-oriented modules should be based on blessed hashes, blessed arrays, or what.\nEssentially the only difference here is that we're not blessing our nodes or talking in\nterms of classes and methods.\n\n[end footnote]\n\nSo, we might as well represent nodes like so:\n\n$node = { # hashref\n'board'          => ...board string, e.g., \"xxxxoooo\"\n\n'lastmovepayoff' => ...payoff of the move\nthat got us here.\n\n'lastmovefrom' =>  ...the start...\n'lastmoveto'   =>  ...and end point of the move\nthat got us here.  E.g., 5 and 6,\nrepresenting a move from 5 to 6.\n\n'whoseturn'     => ...whose move it then becomes.\njust an 'x' or 'o'.\n\n'successors' => ...the successors\n};\n\nNote that we could have a field called something like 'lastmovewho' to denote who last moved,\nbut since turns in Alak always alternate (and no-one can pass), storing whose move it is now\n*and* who last moved is redundant -- if X last moved, it's O turn now, and vice versa. I chose\nto have a 'whoseturn' field instead of a 'lastmovewho', but it doesn't really matter. Either\nway, we'll end up inferring one from the other at several points in the program.\n\nWhen we want to store the successors of a node, should we use an array or a hash? On the one\nhand, the successors to $node aren't essentially ordered, so there's no reason to use an array\nper se; on the other hand, if we used a hash, with successor nodes as values, we don't have\nanything particularly meaningful to use as keys. (And we can't use the successors themselves as\nkeys, since the nodes are referred to by hash references, and you can't use a reference as a\nhash key.) Given no particularly compelling reason to do otherwise, I choose to just use an\narray to store all a node's successors, although the order is never actually used for anything:\n\n$node = {\n...\n'successors' => [ ...nodes... ],\n...\n};\n\nIn any case, now that we've settled on what should be in a node, let's make a little sample tree\nout of a few nodes and see what we can do with it:\n\n# Board just before move 3 in above game\nmy $n0 = {\n'board' => 'xxxxoooo',\n'lastmovepayoff' => 0,\n'lastmovefrom' =>  9,\n'lastmoveto'   =>  7,\n'whoseturn' => 'x',\n'successors' => [],\n};\n\n# And, for now, just two of the successors:\n\n# X moves 4 to 6, giving xxxxoooo\nmy $n1 = {\n'board' => 'xxxxoooo',\n'lastmovepayoff' => 0,\n'lastmovefrom' =>  4,\n'lastmoveto'   =>  6,\n'whoseturn' => 'o',\n'successors' => [],\n};\n\n# or X moves 5 to 6, giving xxxxoooo\nmy $n2 = {\n'board' => 'xxxxoooo',\n'lastmovepayoff' => 0,\n'lastmovefrom' =>  5,\n'lastmoveto'   =>  6,\n'whoseturn' => 'o',\n'successors' => [],\n};\n\n# Now connect them...\npush @{$n0->{'successors'}}, $n1, $n2;\n"
                },
                {
                    "name": "Digression: Links to Parents",
                    "content": "In comparing what we store in an Alak game tree node to what HTML::Element stores in HTML\nelement nodes, you'll note one big difference: every HTML::Element node contains a link to its\nparent, whereas we don't have our Alak nodes keeping a link to theirs.\n\nThe reason this can be an important difference is because it can affect how Perl knows when\nyou're not using pieces of memory anymore. Consider the tree we just built, above:\n\nnode 0\n/      \\\nnode 1    node 2\n\nThere's two ways Perl knows you're using a piece of memory: 1) it's memory that belongs directly\nto a variable (i.e., is necessary to hold that variable's value, or value*s* in the case of a\nhash or array), or 2) it's a piece of memory that something holds a reference to. In the above\ncode, Perl knows that the hash for node 0 (for board \"xxxxoooo\") is in use because something\n(namely, the variable $n0) holds a reference to it. Now, even if you followed the above code\nwith this:\n\n$n1 = $n2 = 'whatever';\n\nto make your variables $n1 and $n2 stop holding references to the hashes for the two successors\nof node 0, Perl would still know that those hashes are still in use, because node 0's successors\narray holds a reference to those hashes. And Perl knows that node 0 is still in use because\nsomething still holds a reference to it. Now, if you added:\n\nmy $root = $n0;\n\nThis would change nothing -- there's just be *two* things holding a reference to the node 0\nhash, which in turn holds a reference to the node 1 and node 2 hashes. And if you then added:\n\n$n0 = 'stuff';\n\nstill nothing would change, because something ($root) still holds a reference to the node 0\nhash. But once *nothing* holds a reference to the node 0 hash, Perl will know it can destroy\nthat hash (and reclaim the memory for later use, say), and once it does that, nothing will hold\na reference to the node 1 or the node 2 hashes, and those will be destroyed too.\n\nBut consider if the node 1 and node 2 hashes each had an attribute \"parent\" (or \"predecessor\")\nthat held a reference to node 0. If your program stopped holding a reference to the node 0 hash,\nPerl could *not* then say that *nothing* holds a reference to node 0 -- because node 1 and node\n2 still do. So, the memory for nodes 0, 1, and 2 would never get reclaimed (until your program\nended, at which point Perl destroys *everything*). If your program grew and discarded lots of\nnodes in the game tree, but didn't let Perl know it could reclaim their memory, your program\ncould grow to use immense amounts of memory -- never a nice thing to have happen. There's three\nways around this:\n\n1) When you're finished with a node, delete the reference each of its children have to it (in\nthis case, deleting $n1->{'parent'}, say). When you're finished with a whole tree, just go\nthrough the whole tree erasing links that children have to their children.\n\n2) Reconsider whether you really need to have each node hold a reference to its parent. Just not\nhaving those links will avoid the whole problem.\n\n3) use the WeakRef module with Perl 5.6 or later. This allows you to \"weaken\" some references\n(like the references that node 1 and 2 could hold to their parent) so that they don't count when\nPerl goes asking whether anything holds a reference to a given piece of memory. This wonderful\nnew module eliminates the headaches that can often crop up with either of the two previous\nmethods.\n\nIt so happens that our Alak program is simple enough that we don't need for our nodes to have\nlinks to their parents, so the second solution is fine. But in a more advanced program, the\nfirst or third solutions might be unavoidable.\n"
                },
                {
                    "name": "Recursively Printing the Tree",
                    "content": "I don't like working blind -- if I have any kind of a complex data structure in memory for a\nprogram I'm working on, the first thing I do is write something that can dump that structure to\nthe screen so I can make sure that what I *think* is in memory really *is* what's in memory.\nNow, I could just use the \"x\" pretty-printer command in Perl's interactive debugger, or I could\nhave the program use the \"Data::Dumper\" module. But in this case, I think the output from those\nis rather too verbose. Once we have trees with dozens of nodes in them, we'll really want a dump\nof the tree to be as concise as possible, hopefully just one line per node. What I'd like is\nsomething that can print $n0 and its successors (see above) as something like:\n\nxxxxoooo  (O moved 9 to 7, 0 payoff)\nxxxxoooo  (X moved 4 to 6, 0 payoff)\nxxxxoooo  (X moved 5 to 6, 0 payoff)\n\nA subroutine to print a line for a given node, and then do that again for each successor, would\nlook something like:\n\nsub dumptree {\nmy $n = $[0]; # \"n\" is for node\nprint\n...something expressing $n'n content...\nforeach my $s (@{$n->{'successors'}}) {\n# \"s for successor\ndump($s);\n}\n}\n\nAnd we could just start that out with a call to \"dumptree($n0)\".\n\nSince this routine...\n\nFootnote: I first wrote this routine starting out with \"sub dump {\". But when I tried\nactually calling \"dump($n0)\", Perl would dump core! Imagine my shock when I discovered that\nthis is absolutely to be expected -- Perl provides a built-in function called \"dump\", the\npurpose of which is to, yes, make Perl dump core. Calling our routine \"dumptree\" instead of\n\"dump\" neatly avoids that problem.\n\n...does its work (dumping the subtree at and under the given node) by calling itself, it's\nrecursive. However, there's a special term for this kind of recursion across a tree: traversal.\nTo traverse a tree means to do something to a node, and to traverse its children. There's two\nprototypical ways to do this, depending on what happens when:\n\ntraversing X in pre-order:\n* do something to X\n* then traverse X's children\n\ntraversing X in post-order:\n* traverse X's children\n* then do something to X\n\nDumping the tree to the screen the way we want it happens to be a matter of pre-order traversal,\nsince the thing we do (print a description of the node) happens before we recurse into the\nsuccessors.\n\nWhen we try writing the \"print\" statement for our above \"dumptree\", we can get something like:\n\nsub dumptree {\nmy $n = $[0];\n\n# \"xxxxoooo  (O moved 9 to 7, 0 payoff)\"\nprint\n$n->{'board'}, \"  (\",\n($n->{'whoseturn'} eq 'o' ? 'X' : 'O'),\n# Infer who last moved from whose turn it is now.\n\" moved \", $n->{'lastmovefrom'},\n\" to \",    $n->{'lastmoveto'},\n\", \",      $n->{'lastmovepayoff'},\n\" payoff)\\n\",\n;\n\nforeach my $s (@{$n->{'successors'}}) {\ndumptree($s);\n}\n}\n\nIf we run this on $n0 from above, we get this:\n\nxxxxoooo  (O moved 9 to 7, 0 payoff)\nxxxxoooo  (X moved 4 to 6, 0 payoff)\nxxxxoooo  (X moved 5 to 6, 0 payoff)\n\nEach line on its own is fine, but we forget to allow for indenting, and without that we can't\ntell what's a child of what. (Imagine if the first successor had successors of its own -- you\nwouldn't be able to tell if it were a child, or a sibling.) To get indenting, we'll need to have\nthe instances of the \"dumptree\" routine know how far down in the tree they're being called, by\npassing a depth parameter between them:\n\nsub dumptree {\nmy $n = $[0];\nmy $depth = $[1];\n$depth = 0 unless defined $depth;\nprint\n\"  \" x $depth,\n...stuff...\nforeach my $s (@{$n->{'successors'}}) {\ndumptree($s, $depth + 1);\n}\n}\n\nWhen we call \"dumptree($n0)\", $depth (from $[1]) is undefined, so gets set to 0, which\ntranslates into an indenting of no spaces. But when \"dumptree\" invokes itself on $n0's\nchildren, those instances see $depth + 1 as their $[1], giving appropriate indenting.\n\nFootnote: Passing values around between different invocations of a recursive routine, as\nshown, is a decent way to share the data. Another way to share the data is by keeping it in\na global variable, like $Depth, initially set to 0. Each time \"dumptree\" is about to\nrecurse, it must \"++$Depth\", and when it's back, it must \"--$Depth\".\n\nOr, if the reader is familiar with closures, consider this approach:\n\nsub dumptree {\n# A wrapper around calls to a recursive closure:\nmy $startnode = $[0];\nmy $depth = 0;\n# to be shared across calls to $recursor.\nmy $recursor;\n$recursor = sub {\nmy $n = $[0];\nprint \"  \" x $depth,\n...stuff...\n++$depth;\nforeach my $s (@{$n->{'successors'}}) {\n$recursor->($s);\n}\n--$depth;\n}\n$recursor->($startnode); # start recursing\nundef $recursor;\n}\n\nThe reader with an advanced understanding of Perl's reference-count-based garbage collection\nis invited to consider why it is currently necessary to undef $recursor (or otherwise change\nits value) after all recursion is done.\n\nThe reader whose mind is perverse in other ways is invited to consider how (or when!)\npassing a depth parameter around is unnecessary because of information that Perl's caller(N)\nfunction reports!\n\n[end footnote]\n"
                },
                {
                    "name": "Growing the Tree",
                    "content": "Our \"dumptree\" routine works fine for the sample tree we've got, so now we should get the\nprogram working on making its own trees, starting from a given board.\n\nIn \"Games::Alak\" (the CPAN-released version of Alak that uses essentially the same code that\nwe're currently discussing the tree-related parts of), there is a routine called\n\"figuresuccessors\" that, given one childless node, will figure out all its possible successors.\nThat is, it looks at the current board, looks at every piece belonging to the player whose turn\nit is, and considers the effect of moving each piece every possible way -- notably, it figures\nout the immediate payoff, and if that move would end the game, it notes that by setting an\n\"endgame\" entry in that node's hash. (That way, we know that that's a node that *can't* have\nsuccessors.)\n\nIn the code for \"Games::Alak\", \"figuresuccessors\" does all these things, in a rather\nstraightforward way. I won't walk you through the details of the \"figuresuccessors\" code I've\nwritten, since the code has nothing much to do with trees, and is all just implementation of the\nAlak rules for what can move where, with what result. Especially interested readers can puzzle\nover that part of code in the source listing in the archive from CPAN, but others can just\nassume that it works as described above.\n\nBut consider that \"figuresuccessors\", regardless of its inner workings, does not grow the\n*tree*; it only makes one set of successors for one node at a time. It has to be up to a\ndifferent routine to call \"figuresuccessors\", and to keep applying it as needed, in order to\nmake a nice big tree that our game-playing program can base its decisions on.\n\nNow, we could do this by just starting from one node, applying \"figuresuccessors\" to it, then\napplying \"figuresuccessors\" on all the resulting children, and so on:\n\nsub grow {  # Just a first attempt at this!\nmy $n = $[0];\nfiguresuccessors($n);\nunless\n@{$n->{'successors'}}\n# already has successors.\nor $n->{'endgame'}\n# can't have successors.\n}\nforeach my $s (@{$n->{'successors'}}) {\ngrow($s); # recurse\n}\n}\n\nIf you have a game tree for tic-tac-toe, and you grow it without limitation (as above), you will\nsoon enough have a fully \"solved\" tree, where every node that *can* have successors *does*, and\nall the leaves of the tree are *all* the possible endgames (where, in each case, the board is\nfilled). But a game of Alak is different from tic-tac-toe, because it can, in theory, go on\nforever. For example, the following sequence of moves is quite possible:\n\nxxxxoooo\nxxxxoooo\nxxxxoooo\nxxxxoooo (x moved back)\nxxxxoooo (o moved back)\n...repeat forever...\n\nSo if you tried using our above attempt at a \"grow\" routine, Perl would happily start trying to\nconstruct an infinitely deep tree, containing an infinite number of nodes, consuming an infinite\namount of memory, and requiring an infinite amount of time. As the old saying goes: \"You can't\nhave everything -- where would you put it?\" So we have to place limits on how much we'll grow\nthe tree.\n\nThere's more than one way to do this:\n\n1. We could grow the tree until we hit some limit on the number of nodes we'll allow in the\ntree.\n\n2. We could grow the tree until we hit some limit on the amount of time we're willing to spend.\n\n3. Or we could grow the tree until it is fully fleshed out to a certain depth.\n\nSince we already know to track depth (as we did in writing \"dumptree\"), we'll do it that way,\nthe third way. The implementation for that third approach is also pretty straightforward:\n\n$Maxdepth = 3;\nsub grow {\nmy $n = $[0];\nmy $depth = $[1] || 0;\nfiguresuccessors($n)\nunless\n$depth >= $Maxdepth\nor @{$n->{'successors'}}\nor $n->{'endgame'}\n}\nforeach my $s (@{$n->{'successors'}}) {\ngrow($s, $depth + 1);\n}\n# If we're at $Maxdepth, then figuresuccessors\n#  didn't get called, so there's no successors\n#  to recurse under -- that's what stops recursion.\n}\n\nIf we start from a single node (whether it's a node for the starting board \"xxxxoooo\", or for\nwhatever board the computer is faced with), set $Maxdepth to 4, and apply \"grow\" to it, it will\ngrow the tree to include several hundred nodes.\n\nFootnote: If at each move there are four pieces that can move, and they can each move right\nor left, the \"branching factor\" of the tree is eight, giving a tree with 1 (depth 0) + 8\n(depth 1) + 8  2 + 8  3 + 8  4 = 4681 nodes in it. But, in practice, not all pieces\ncan move in both directions (none of the x pieces in \"xxxxoooo\" can move left, for\nexample), and there may be fewer than four pieces, if some were lost. For example, there are\n801 nodes in a tree of depth four starting from \"xxxxoooo\", suggesting an average\nbranching factor of about five (801  (1/4) is about 5.3), not eight.\n\nWhat we need to derive from that tree is the information about what are the best moves for X.\nThe simplest way to consider the payoff of different successors is to just average them -- but\nwhat we average isn't always their immediate payoffs (because that'd leave us using only one\ngeneration of information), but the average payoff of *their* successors, if any. We can\nformalize this as:\n\nTo figure a node's average payoff:\nIf the node has successors:\nFigure each successor's average payoff.\nMy average payoff is the average of theirs.\nOtherwise:\nMy average payoff is my immediate payoff.\n\nSince this involves recursing into the successors *before* doing anything with the current node,\nthis will traverse the tree *in post-order*.\n\nWe could work that up as a routine of its own, and apply that to the tree after we've applied\n\"grow\" to it. But since we'd never grow the tree without also figuring the average benefit, we\nmight as well make that figuring part of the \"grow\" routine itself:\n\n$Maxdepth = 3;\nsub grow {\nmy $n = $[0];\nmy $depth = $[1] || 0;\nfiguresuccessors($n);\nunless\n$depth >= $Maxdepth\nor @{$n->{'successors'}}\nor $n->{'endgame'}\n}\n\nif(@{$n->{'successors'}}) {\nmy $apayoffsum = 0;\nforeach my $s (@{$n->{'successors'}}) {\ngrow($s, $depth + 1);  # RECURSE\n$apayoffsum += $s->{'averagepayoff'};\n}\n$n->{'averagepayoff'}\n= $apayoffsum / @{$n->{'successors'}};\n} else {\n$n->{'averagepayoff'}\n= $n->{'lastmovepayoff'};\n}\n}\n\nSo, by time \"grow\" has applied to a node (wherever in the tree it is), it will have figured\nsuccessors if possible (which, in turn, sets \"lastmovepayoff\" for each node it creates), and\nwill have set \"averagebenefit\".\n\nBeyond this, all that's needed is to start the board out with a root note of \"xxxxoooo\", and\nhave the computer (X) take turns with the user (O) until someone wins. Whenever it's O's turn,\n\"Games::Alak\" presents a prompt to the user, letting him know the state of the current board,\nand asking what move he selects. When it's X's turn, the computer grows the game tree as\nnecessary (using just the \"grow\" routine from above), then selects the move with the highest\naverage payoff (or one of the highest, in case of a tie).\n\nIn either case, \"selecting\" a move means just setting that move's node as the new root of the\nprogram's game tree. Its sibling nodes and their descendants (the boards that *didn't* get\nselected) and its parent node will be erased from memory, since they will no longer be in use\n(as Perl can tell by the fact that nothing holds references to them anymore).\n\nThe interface code in \"Games::Alak\" (the code that prompts the user for his move) actually\nsupports quite a few options besides just moving -- including dumping the game tree to a\nspecified depth (using a slightly fancier version of \"dumptree\", above), resetting the game,\nchanging $Maxdepth in the middle of the game, and quitting the game. Like \"figuresuccessors\",\nit's a bit too long to print here, but interested users are welcome to peruse (and freely\nmodify) the code, as well as to enjoy just playing the game.\n\nNow, in practice, there's more to game trees than this: for games with a larger branching factor\nthan Alak has (which is most!), game trees of depth four or larger would contain too many nodes\nto be manageable, most of those nodes being strategically quite uninteresting for either player;\ndealing with game trees specifically is therefore a matter of recognizing uninteresting\ncontingencies and not bothering to grow the tree under them.\n\nFootnote: For example, to choose a straightforward case: if O has a choice between moves\nthat put him in immediate danger of X winning and moves that don't, then O won't ever choose\nthe dangerous moves (and if he does, the computer will know enough to end the game), so\nthere's no point in growing the tree any further beneath those nodes.\n\nBut this sample implementation should illustrate the basics of how to build and manipulate a\nsimple tree structure in memory. And once you've understood the basics of tree storage here, you\nshould be ready to better understand the complexities and peculiarities of other systems for\ncreating, accessing, and changing trees, including Tree::DAGNode, HTML::Element, XML::DOM, or\nrelated formalisms like XPath and XSL.\n\n[end body of article]\n\n[Author Credit]\nSean M. Burke (\"sburke@cpan.org\") is a tree-dwelling hominid.\n"
                },
                {
                    "name": "References",
                    "content": "Dewdney, A[lexander] K[eewatin]. 1984. *Planiverse: Computer Contact with a Two-Dimensional\nWorld.* Poseidon Press, New York.\n\nKnuth, Donald Ervin. 1997. *Art of Computer Programming, Volume 1, Third Edition: Fundamental\nAlgorithms*. Addison-Wesley, Reading, MA.\n\nWirth, Niklaus. 1976. *Algorithms + Data Structures = Programs* Prentice-Hall, Englewood Cliffs,\nNJ.\n\nWorth, Stan and Allman Sheldon. Circa 1967. *George of the Jungle* theme. [music by Jay Ward.]\n\nWirth's classic, currently and lamentably out of print, has a good section on trees. I find it\nclearer than Knuth's (if not quite as encyclopedic), probably because Wirth's example code is in\na block-structured high-level language (basically Pascal), instead of in assembler (MIX). I\nbelieve the book was re-issued in the 1980s under the titles *Algorithms and Data Structures*\nand, in a German edition, *Algorithmen und Datenstrukturen*. Cheap copies of these editions\nshould be available through used book services such as \"abebooks.com\".\n\nWorth's classic, however, is available on the soundtrack to the 1997 *George of the Jungle*\nmovie, as performed by The Presidents of the United States of America.\n"
                }
            ]
        },
        "BACK": {
            "content": "Return to the HTML::Tree docs.\n",
            "subsections": []
        }
    },
    "summary": "HTML::Tree::AboutTrees -- article on tree-shaped data structures in Perl",
    "flags": [],
    "examples": [],
    "see_also": []
}