{
    "mode": "info",
    "parameter": "HTML::Tree::AboutTrees",
    "section": "",
    "url": "https://www.chedong.com/phpMan.php/info/HTML%3A%3ATree%3A%3AAboutTrees/json",
    "generated": "2026-07-05T13:41:19Z",
    "synopsis": "# This an article, not a module.",
    "sections": {
        "NAME": {
            "content": "HTML::Tree::AboutTrees -- article on tree-shaped data structures in\nPerl\n",
            "subsections": []
        },
        "SYNOPSIS": {
            "content": "# This an article, not a module.\n",
            "subsections": []
        },
        "DESCRIPTION": {
            "content": "The following article by Sean M. Burke first appeared in The Perl\nJournal #18 and is copyright 2000 The Perl Journal. It appears courtesy\nof Jon Orwant and The Perl Journal.  This document may be distributed\nunder the same terms as Perl itself.\n",
            "subsections": []
        },
        "Trees": {
            "content": "-- Sean M. Burke\n\n\"AaaAAAaauugh!  Watch out for that tree!\"\n-- George of the Jungle theme\n\nPerl's facility with references, combined with its automatic management\nof memory allocation, makes it straightforward to write programs that\nstore data in structures of arbitrary form and complexity.\n\nBut I've noticed that many programmers, especially those who started\nout with more restrictive languages, seem at home with complex but\nuniform data structures -- N-dimensional arrays, or more struct-like\nthings like hashes-of-arrays(-of-hashes(-of-hashes), etc.) -- but\nthey're often uneasy with building more freeform, less tabular\nstructures, like tree-shaped data structures.\n\nBut trees are easy to build and manage in Perl, as I'll demonstrate by\nshowing off how the HTML::Element class manages elements in an HTML\ndocument tree, and by walking you through a from-scratch implementation\nof game trees.  But first we need to nail down what we mean by a\n\"tree\".\n\nSocratic Dialogues: \"What is a Tree?\"\nMy first brush with tree-shaped structures was in linguistics classes,\nwhere tree diagrams are used to describe the syntax underlying natural\nlanguage sentences.  After learning my way around those trees, I\nstarted to wonder -- are what I'm used to calling \"trees\" the same as\nwhat programmers call \"trees\"?  So I asked lots of helpful and patient\nprogrammers how they would define a tree.  Many replied with a answer\nin jargon that they could not really explain (understandable, since\nexplaining things, especially defining things, is harder than people\nthink):\n\n-- So what is a \"tree\", a tree-shaped data structure?\n\n-- A tree is a special case of an acyclic directed graph!\n\n-- What's a \"graph\"?\n\n-- Um... lines... and... you draw it... with... arcs! nodes!  um...\n\nThe most helpful were folks who couldn't explain directly, but with\nwhom I could get into a rather Socratic dialog (where I asked the half-\ndim half-earnest questions), often with much doodling of\nillustrations...\n\nQuestion: so what's a tree?\n\nAnswer: A tree is a collection of nodes that are linked together in a,\nwell, tree-like way!  Like this [drawing on a napkin]:\n\nA\n/ \\\nB   C\n/ | \\\nD  E  F\n\nQ: So what do these letters represent?\n\nA: Each is a different node, a bunch of data.  Maybe C is a bunch of\ndata that stores a number, maybe a hash table, maybe nothing at all\nbesides the fact that it links to D, E, and F (which are other nodes).\n\nQ: So what're the lines between the nodes?\n\nA: Links.  Also called \"arcs\".  They just symbolize the fact that each\nnode holds a list of nodes it links to.\n\nQ: So what if I draw nodes and links, like this...\n\nB -- E\n/ \\  / \\\nA   C\n\\ /\nE\n\nIs that still a tree?\n\nA: No, not at all.  There's a lot of un-treelike things about that.\nFirst off, E has a link coming off of it going into nowhere.  You can't\nhave a link to nothing -- you can only link to another node.  Second\noff, I don't know what that sideways link between B and E means...\n\nQ: Okay, let's work our way up from something simpler.  Is this a\ntree...?\n\nA\n\nA: Yes, I suppose.  It's a tree of just one node.\n\nQ: And how about...\n\nA\n\nB\n\nA: No, you can't just have nodes floating there, unattached.\n\nQ: Okay, I'll link A and B.  How's this?\n\nA\n|\nB\n\nA: Yup, that's a tree.  There's a node A, and a node B, and they're\nlinked.\n\nQ: How is that tree any different from this one...?\n\nB\n|\nA\n\nA: Well, in both cases A and B are linked.  But it's in a different\ndirection.\n\nQ: Direction?  What does the direction mean?\n\nA: Well, it depends what the tree represents.  If it represents a\ncategorization, like this:\n\ncitrus\n/    |    \\\norange  lemon  kumquat ...\n\nthen you mean to say that oranges, lemons, kumquats, etc., are a kind\nof citrus.  But if you drew it upside down, you'd be saying, falsely,\nthat citrus is a kind of kumquat, a kind of lemon, and a kind of\norange.  If the tree represented cause-and-effect (or at least what\nsituations could follow others), or represented what's a part of what,\nyou wouldn't want to get those backwards, either.  So with the nodes\nyou draw together on paper, one has to be over the other, so you can\ntell which way the relationship in the tree works.\n\nQ:  So are these two trees the same?\n\nA          A\n/ \\        / \\\nB   C      B   \\\nC\n\nA: Yes, although by convention we often try to line up things in the\nsame generation, like it is in the diagram on the left.\n\nQ: \"generation\"?  This is a family tree?\n\nA: No, not unless it's a family tree for just yeast cells or something\nelse that reproduces asexually.  But for sake of having lots of terms\nto use, we just pretend that links in the tree represent the \"is a\nchild of\" relationship, instead of \"is a kind of\" or \"is a part of\", or\n\"could result from\", or whatever the real relationship is.  So we get\nto borrow a lot of kinship words for describing trees -- B and C are\n\"children\" (or \"daughters\") of A; A is the \"parent\" (or \"mother\") of B\nand C.  Node C is a \"sibling\" (or \"sister\") of node C; and so on, with\nterms like \"descendants\" (a node's children, children's children,\netc.), and \"generation\" (all the nodes at the same \"level\" in the tree,\ni.e., are either all grandchildren of the top node, or all great-grand-\nchildren, etc.), and \"lineage\" or \"ancestors\" (parents, and parent's\nparents, etc., all the way to the topmost node).\n\nSo then we get to express rules in terms like \"A node cannot have more\nthan one parent\", which means that this is not a valid tree:\n\nA\n/ \\\nB   C\n\\ /\nE\n\nAnd: \"A node can't be its own parent\", which excludes this looped-up\nconnection:\n\n/\\\nA  |\n\\/\n\nOr, put more generally: \"A node can't be its own ancestor\", which\nexcludes the above loop, as well as the one here:\n\n/\\\nZ  |\n/   |\nA    |\n/ \\   |\nB   C  |\n\\/\n\nThat tree is excluded because A is a child of Z, and Z is a child of C,\nand C is a child of A, which means A is its own great-grandparent.  So\nthis whole network can't be a tree, because it breaks the sort of meta-\nrule: once any node in the supposed tree breaks the rules for trees,\nyou don't have a tree anymore.\n\nQ: Okay, now, are these two trees the same?\n\nA         A\n/ | \\     / | \\\nB  C  D   D  C  B\n\nA: It depends whether you're basing your concept of trees on each node\nhaving a set (unordered list) of children, or an (ordered) list of\nchildren.  It's a question of whether ordering is important for what\nyou're doing.  With my diagram of citrus types, ordering isn't\nimportant, so these tree diagrams express the same thing:\n\ncitrus\n/    |    \\\norange  lemon  kumquat\n\ncitrus\n/     |    \\\nkumquat  orange  lemon\n\nbecause it doesn't make sense to say that oranges are \"before\" or\n\"after\" kumquats in the whole botanical scheme of things.  (Unless, of\ncourse, you are using ordering to mean something, like a degree of\ngenetic similarity.)\n\nBut consider a tree that's a diagram of what steps are comprised in an\nactivity, to some degree of specificity:\n\nmake tea\n/    |     \\\npour     infuse   serve\nhot water    / \\\nin cup/pot  /     \\\nadd     let\ntea     sit\nleaves\n\nThis means that making tea consists of putting hot water in a cup or\nput, infusing it (which itself consists of adding tea leaves and\nletting it sit), then serving it -- in that order.  If you serve an\nempty dry pot (sipping from empty cups, etc.), let it sit, add tea\nleaves, and pour in hot water, then what you're doing is performance\nart, not tea preparation:\n\nperformance\nart\n/    |     \\\nserve   infuse    pour\n/ \\       hot water\n/     \\      in cup/pot\nlet     add\nsit     tea\nleaves\n\nExcept for my having renamed the root, this tree is the same as the\nmaking-tea tree as far as what's under what, but it differs in order,\nand what the tree means makes the order important.\n\nQ: Wait -- \"root\"?  What's a root?\n\nA: Besides kinship terms like \"mother\" and \"daughter\", the jargon for\ntree parts also has terms from real-life tree parts:  the part that\neverything else grows from is called the root; and nodes that don't\nhave nodes attached to them (i.e., childless nodes) are called\n\"leaves\".\n\nQ: But you've been drawing all your trees with the root at the top and\nleaves at the bottom.\n\nA: Yes, but for some reason, that's the way everyone seems to think of\ntrees.  They can draw trees as above; or they can draw them sort of\nsideways with indenting representing what nodes are children of what:\n\n* make tea\n* pour hot water in cup/pot\n* infuse\n* add tea leaves\n* let sit\n* serve\n\n...but folks almost never seem to draw trees with the root at the\nbottom.  So imagine it's based on spider plant in a hanging pot.\nUnfortunately, spider plants aren't botanically trees, they're plants;\nbut \"spider plant diagram\" is rather a mouthful, so let's just call\nthem trees.\n\nTrees Defined Formally\nIn time, I digested all these assorted facts about programmers' ideas\nof trees (which turned out to be just a more general case of linguistic\nideas of trees) into a single rule:\n\n* A node is an item that contains (\"is over\", \"is parent of\", etc.)\nzero or more other nodes.\n\nFrom this you can build up formal definitions for useful terms, like\nso:\n\n* A node's descendants are defined as all its children, and all their\nchildren, and so on.  Or, stated recursively: a node's descendants are\nall its children, and all its children's descendants.  (And if it has\nno children, it has no descendants.)\n\n* A node's ancestors consist of its parent, and its parent's parent,\netc, up to the root.  Or, recursively: a node's ancestors consist of\nits parent and its parent's ancestors.  (If it has no parent, it has no\nancestors.)\n\n* A tree is a root node and all the root's descendants.\n\nAnd you can add a proviso or two to clarify exactly what I impute to\nthe word \"other\" in \"other nodes\":\n\n* A node cannot contain itself, or contain any node that contains it,\netc.  Looking at it the other way: a node cannot be its own parent or\nancestor.\n\n* A node can be root (i.e., no other node contains it) or can be\ncontained by only one parent; no node can be the child of two or more\nparents.\n\nAdd to this the idea that children are sometimes ordered, and sometimes\nnot, and that's about all you need to know about defining what a tree\nis.  From there it's a matter of using them.\n\nMarkup Language Trees: HTML-Tree\nWhile not all markup languages are inherently tree-like, the best-known\nfamily of markup languages, HTML, SGML, and XML, are about as tree-like\nas you can get.  In these languages, a document consists of elements\nand character data in a tree structure where there is one root element,\nand elements can contain either other elements, or character data.\n\nFootnote: For sake of simplicity, I'm glossing over comments (<!--\n... -->), processing instructions (<?xml version='1.0'>), and\ndeclarations (<!ELEMENT ...>, <!DOCTYPE ...>).  And I'm not\nbothering to distinguish entity references (&lt;, &#64;) or CDATA\nsections (<![CDATA[ ...]]>) from normal text.\n\nFor example, consider this HTML document:\n\n<html lang=\"en-US\">\n<head>\n<title>\nBlank Document!\n</title>\n</head>\n<body bgcolor=\"#d010ff\">\nI've got\n<em>\nsomething to saaaaay\n</em>\n!\n</body>\n</html>\n\nI've indented this to point out what nodes (elements or text items) are\nchildren of what, with each node on a line of its own.\n\nThe HTML::TreeBuilder module (in the CPAN distribution HTML-Tree) does\nthe work of taking HTML source and building in memory the tree that the\ndocument source represents.\n\nFootnote: it requires the HTML::Parser module, which tokenizes the\nsource -- i.e., identifies each tag, bit of text, comment, etc.\n\nThe trees structures that it builds represent bits of text with normal\nPerl scalar string values; but elements are represented with objects --\nthat is, chunks of data that belong to a class (in this case,\nHTML::Element), a class that provides methods (routines) for accessing\nthe pieces of data in each element, and otherwise doing things with\nelements.  (See my article in TPJ#17 for a quick explanation of\nobjects, the POD document \"perltoot\" for a longer explanation, or\nDamian Conway's excellent book Object-Oriented Perl for the full\nstory.)\n\nEach HTML::Element object contains a number of pieces of data:\n\n* its element name (\"html\", \"h1\", etc., accessed as $element->tag)\n\n* a list of elements (or text segments) that it contains, if any\n(accessed as $element->contentlist or $element->content, depending on\nwhether you want a list, or an arrayref)\n\n* what element, if any, contains it (accessed as $element->parent)\n\n* and any SGML attributes that the element has, such as \"lang=\"en-US\"\",\n\"align=\"center\"\", etc. (accessed as $element->attr('lang'),\n$element->attr('center'), etc.)\n\nSo, for example, when HTML::TreeBuilder builds the tree for the above\nHTML document source, the object for the \"body\" element has these\npieces of data:\n\n* element name: \"body\"\n* nodes it contains:\nthe string \"I've got \"\nthe object for the \"em\" element\nthe string \"!\"\n* its parent:\nthe object for the \"html\" element\n* bgcolor: \"#d010ff\"\n\nNow, once you have this tree of objects, almost anything you'd want to\ndo with it starts with searching the tree for some bit of information\nin some element.\n\nAccessing a piece of information in, say, a hash of hashes of hashes,\nis straightforward:\n\n$password{'sean'}{'sburke1'}{'hpux'}\n\nbecause you know that all data points in that structure are accessible\nwith that syntax, but with just different keys.  Now, the \"em\" element\nin the above HTML tree does happen to be accessible as the root's child\n#1's child #1:\n\n$root->content->[1]->content->[1]\n\nBut with trees, you typically don't know the exact location (via\nindexes) of the data you're looking for.  Instead, finding what you\nwant will typically involve searching through the tree, seeing if every\nnode is the kind you want.  Searching the whole tree is simple enough\n-- look at a given node, and if it's not what you want, look at its\nchildren, and so on.  HTML-Tree provides several methods that do this\nfor you, such as \"findbytagname\", which returns the elements (or the\nfirst element, if called in scalar context) under a given node\n(typically the root) whose tag name is whatever you specify.\n\nFor example, that \"em\" node can be found as:\n\nmy $thatem = $root->findbytagname('em');\n\nor as:\n\n@ems = $root->findbytagname('em');\n# will only have one element for this particular tree\n\nNow, given an HTML document of whatever structure and complexity, if\nyou wanted to do something like change every\n\n<em>stuff</em>\n\nto\n\n<em class=\"funky\"> <b>[-</b> stuff <b>-]</b> </em>\n\nthe first step is to frame this operation in terms of what you're doing\nto the tree.  You're changing this:\n\nem\n|\n...\n\nto this:\n\nem\n/  |  \\\nb  ...   b\n|        |\n\"[-\"     \"-]\"\n\nIn other words, you're finding all elements whose tag name is \"em\",\nsetting its class attribute to \"funky\", and adding one child to the\nstart of its content list -- a new \"b\" element whose content is the\ntext string \"[-\" -- and one to the end of its content list -- a new \"b\"\nelement whose content is the text string \"-]\".\n\nOnce you've got it in these terms, it's just a matter of running to the\nHTML::Element documentation, and coding this up with calls to the\nappropriate methods, like so:\n\nuse HTML::Element 1.53;\nuse HTML::TreeBuilder 2.96;\n# Build the tree by parsing the document\nmy $root = HTML::TreeBuilder->new;\n$root->parsefile('whatever.html'); # source file\n\n# Now make new nodes where needed\nforeach my $em ($root->findbytagname('em')) {\n$em->attr('class', 'funky'); # Set that attribute\n\n# Make the two new B nodes\nmy $new1 = HTML::Element->new('b');\nmy $new2 = HTML::Element->new('b');\n# Give them content (they have none at first)\n$new1->pushcontent('[-');\n$new2->pushcontent('-]');\n\n# And put 'em in place!\n$em->unshiftcontent($new1);\n$em->pushcontent($new2);\n}\nprint\n\"<!-- Looky see what I did! -->\\n\",\n$root->asHTML(), \"\\n\";\n\nThe class HTML::Element provides just about every method I can image\nyou needing, for manipulating trees made of HTML::Element objects.\n(And what it doesn't directly provide, it will give you the components\nto build it with.)\n\nBuilding Your Own Trees\nTheoretically, any tree is pretty much like any other tree, so you\ncould use HTML::Element for anything you'd ever want to do with tree-\narranged objects.  However, as its name implies, HTML::Element is\nbasically for HTML elements; it has lots of features that make sense\nonly for HTML elements (like the idea that every element must have a\ntag-name).  And it lacks some features that might be useful for general\napplications -- such as any sort of checking to make sure that you're\nnot trying to arrange objects in a non-treelike way.  For a general-\npurpose tree class that does have such features, you can use\nTree::DAGNode, also available from CPAN.\n\nHowever, if your task is simple enough, you might find it overkill to\nbother using Tree::DAGNode.  And, in any case, I find that the best\nway to learn how something works is to implement it (or something like\nit, but simpler) yourself.  So I'll here discuss how you'd implement a\ntree structure, without using any of the existing classes for tree\nnodes.\n\nImplementation: Game Trees for Alak\nSuppose that the task at hand is to write a program that can play\nagainst a human opponent at a strategic board game (as opposed to a\nboard game where there's an element of chance).  For most such games, a\n\"game tree\" is an essential part of the program (as I will argue,\nbelow), and this will be our test case for implementing a tree\nstructure from scratch.\n\nFor sake of simplicity, our game is not chess or backgammon, but\ninstead a much simpler game called Alak.  Alak was invented by the\nmathematician A. K.  Dewdney, and described in his 1984 book\nPlaniverse. The rules of Alak are simple:\n\nFootnote: Actually, I'm describing only my interpretation of the\nrules Dewdney describes in Planiverse.  Many other interpretations\nare possible.\n\n* Alak is a two-player game played on a one-dimensional board with\neleven slots on it.  Each slot can hold at most one piece at a time.\nThere's two kinds of pieces, which I represent here as \"x\" and \"o\" --\nx's belong to one player (called X), o's to the other (called O).\n\n* The initial configuration of the board is:\n\nxxxxoooo\n\nFor sake of the article, the slots are numbered from 1 (on the left) to\n11 (on the right), and X always has the first move.\n\n* The players take turns moving.  At each turn, each player can move\nonly one piece, once.  (This unlike checkers, where you move one piece\nper move but get to keep moving it if you jump an your opponent's\npiece.) A player cannot pass up on his turn.  A player can move any one\nof his pieces to the next unoccupied slot to its right or left, which\nmay involve jumping over occupied slots.  A player cannot move a piece\noff the side of the board.\n\n* If a move creates a pattern where the opponent's pieces are\nsurrounded, on both sides, by two pieces of the mover's color (with no\nintervening unoccupied blank slot), then those surrounded pieces are\nremoved from the board.\n\n* The goal of the game is to remove all of your opponent's pieces, at\nwhich point the game ends.  Removing all-but-one ends the game as well,\nsince the opponent can't surround you with one piece, and so will\nalways lose within a few moves anyway.\n\nConsider, then, this rather short game where X starts:\n\nxxxxoooo\n^         Move 1: X moves from 3 (shown with caret) to 5\n(Note that any of X's pieces could move, but\nthat the only place they could move to is 5.)\nxxxxoooo\n^   Move 2: O moves from 9 to 7.\nxxxxoooo\n^        Move 3: X moves from 4 to 6.\nxxxxoooo\n^  Move 4: O (stupidly) moves from 10 to 9.\nxxxxoooo\n^       Move 5: X moves from 5 to 10, making the board\n\"xxxoooxo\".  The three o's that X just\nsurrounded are removed.\nxxxxo\nO has only one piece, so has lost.\n\nNow, move 4 could have gone quite the other way:\n\nxxxxoooo\nMove 4: O moves from 8 to 4, making the board\n\"xxoxxooo\".  The surrounded x's are removed.\nxxoooo\n^           Move 5: X moves from 1 to 2.\nxxoooo\n^     Move 6: O moves from 7 to 6.\nxxoooo\n^          Move 7: X moves from 2 to 5, removing the o at 4.\nxxooo\n...and so on.\n\nTo teach a computer program to play Alak (as player X, say), it needs\nto be able to look at the configuration of the board, figure out what\nmoves it can make, and weigh the benefit or costs, immediate or\neventual, of those moves.\n\nSo consider the board from just before move 3, and figure all the\npossible moves X could make.  X has pieces in slots 1, 2, 4, and 5.\nThe leftmost two x's (at 1 and 2) are up against the end of the board,\nso they can move only right.  The other two x's (at 4 and 5) can move\neither right or left:\n\nStarting board: xxxxoooo\nmoving 1 to 3 gives xxxxoooo\nmoving 2 to 3 gives xxxxoooo\nmoving 4 to 3 gives xxxxoooo\nmoving 5 to 3 gives xxxxoooo\nmoving 4 to 6 gives xxxxoooo\nmoving 5 to 6 gives xxxxoooo\n\nFor the computer to decide which of these is the best move to make, it\nneeds to quantify the benefit of these moves as a number -- call that\nthe \"payoff\".  The payoff of a move can be figured as just the number\nof x pieces removed by the most recent move, minus the number of o\npieces removed by the most recent move.  (It so happens that the rules\nof the game mean that no move can delete both o's and x's, but the\nformula still applies.)  Since none of these moves removed any pieces,\nall these moves have the same immediate payoff: 0.\n\nNow, we could race ahead and write an Alak-playing program that could\nuse the immediate payoff to decide which is the best move to make.  And\nwhen there's more than one best move (as here, where all the moves are\nequally good), it could choose randomly between the good alternatives.\nThis strategy is simple to implement; but it makes for a very dumb\nprogram.  Consider what O's response to each of the potential moves\n(above) could be.  Nothing immediately suggests itself for the first\nfour possibilities (X having moved something to position 3), but either\nof the last two (illustrated below) are pretty perilous, because in\neither case O has the obvious option (which he would be foolish to pass\nup) of removing x's from the board:\n\nxxxxoooo\n^        X moves 4 to 6.\nxxxxoooo\n^    O moves 8 to 4, giving \"xxoxxooo\".  The two\nsurrounded x's are removed.\nxxoooo\n\nor\n\nxxxxoooo\n^       X moves 5 to 6.\nxxxxoooo\n^    O moves 8 to 5, giving \"xxxoxooo\".  The one\nsurrounded x is removed.\nxxxoooo\n\nBoth contingencies are quite bad for X -- but this is not captured by\nthe fact that they start out with X thinking his move will be harmless,\nhaving a payoff of zero.\n\nSo what's needed is for X to think more than one step ahead -- to\nconsider not merely what it can do in this move, and what the payoff\nis, but to consider what O might do in response, and the payoff of\nthose potential moves, and so on with X's possible responses to those\ncases could be.  All these possibilities form a game tree -- a tree\nwhere each node is a board, and its children are successors of that\nnode -- i.e., the boards that could result from every move possible,\ngiven the parent's board.\n\nBut how to represent the tree, and how to represent the nodes?\n\nWell, consider that a node holds several pieces of data:\n\n1) the configuration of the board, which, being nice and simple and\none-dimensional, can be stored as just a string, like \"xxxxoooo\".\n\n2) whose turn it is, X or O.  (Or: who moved last, from which we can\nfigure whose turn it is).\n\n3) the successors (child nodes).\n\n4) the immediate payoff of having moved to this board position from its\npredecessor (parent node).\n\n5) and what move gets us from our predecessor node to here.  (Granted,\nknowing the board configuration before and after the move, it's easy to\nfigure out the move; but it's easier still to store it as one is\nfiguring out a node's successors.)\n\n6) whatever else we might want to add later.\n\nThese could be stored equally well in an array or in a hash, but it's\nmy experience that hashes are best for cases where you have more than\njust two or three bits of data, or especially when you might need to\nadd new bits of data.  Moreover, hash key names are mnemonic --\n$node->{'lastmovepayoff'} is plain as day, whereas it's not so easy\nhaving to remember with an array that $node->[3] is where you decided\nto keep the payoff.\n\nFootnote: Of course, there are ways around that problem: just swear\nyou'll never use a real numeric index to access data in the array,\nand instead use constants with mnemonic names:\n\nuse strict;\nuse constant idxPAYOFF => 3;\n...\n$n->[idxPAYOFF]\n\nOr use a pseudohash.  But I prefer to keep it simple, and use a\nhash.\n\nThese are, incidentally, the same arguments that people weigh when\ntrying to decide whether their object-oriented modules should be\nbased on blessed hashes, blessed arrays, or what.  Essentially the\nonly difference here is that we're not blessing our nodes or\ntalking in terms of classes and methods.\n\n[end footnote]\n\nSo, we might as well represent nodes like so:\n\n$node = { # hashref\n'board'          => ...board string, e.g., \"xxxxoooo\"\n\n'lastmovepayoff' => ...payoff of the move\nthat got us here.\n\n'lastmovefrom' =>  ...the start...\n'lastmoveto'   =>  ...and end point of the move\nthat got us here.  E.g., 5 and 6,\nrepresenting a move from 5 to 6.\n\n'whoseturn'     => ...whose move it then becomes.\njust an 'x' or 'o'.\n\n'successors' => ...the successors\n};\n\nNote that we could have a field called something like 'lastmovewho'\nto denote who last moved, but since turns in Alak always alternate (and\nno-one can pass), storing whose move it is now and who last moved is\nredundant -- if X last moved, it's O turn now, and vice versa.  I chose\nto have a 'whoseturn' field instead of a 'lastmovewho', but it\ndoesn't really matter.  Either way, we'll end up inferring one from the\nother at several points in the program.\n\nWhen we want to store the successors of a node, should we use an array\nor a hash?  On the one hand, the successors to $node aren't essentially\nordered, so there's no reason to use an array per se; on the other\nhand, if we used a hash, with successor nodes as values, we don't have\nanything particularly meaningful to use as keys.  (And we can't use the\nsuccessors themselves as keys, since the nodes are referred to by hash\nreferences, and you can't use a reference as a hash key.)  Given no\nparticularly compelling reason to do otherwise, I choose to just use an\narray to store all a node's successors, although the order is never\nactually used for anything:\n\n$node = {\n...\n'successors' => [ ...nodes... ],\n...\n};\n\nIn any case, now that we've settled on what should be in a node, let's\nmake a little sample tree out of a few nodes and see what we can do\nwith it:\n\n# Board just before move 3 in above game\nmy $n0 = {\n'board' => 'xxxxoooo',\n'lastmovepayoff' => 0,\n'lastmovefrom' =>  9,\n'lastmoveto'   =>  7,\n'whoseturn' => 'x',\n'successors' => [],\n};\n\n# And, for now, just two of the successors:\n\n# X moves 4 to 6, giving xxxxoooo\nmy $n1 = {\n'board' => 'xxxxoooo',\n'lastmovepayoff' => 0,\n'lastmovefrom' =>  4,\n'lastmoveto'   =>  6,\n'whoseturn' => 'o',\n'successors' => [],\n};\n\n# or X moves 5 to 6, giving xxxxoooo\nmy $n2 = {\n'board' => 'xxxxoooo',\n'lastmovepayoff' => 0,\n'lastmovefrom' =>  5,\n'lastmoveto'   =>  6,\n'whoseturn' => 'o',\n'successors' => [],\n};\n\n# Now connect them...\npush @{$n0->{'successors'}}, $n1, $n2;\n\nDigression: Links to Parents\nIn comparing what we store in an Alak game tree node to what\nHTML::Element stores in HTML element nodes, you'll note one big\ndifference: every HTML::Element node contains a link to its parent,\nwhereas we don't have our Alak nodes keeping a link to theirs.\n\nThe reason this can be an important difference is because it can affect\nhow Perl knows when you're not using pieces of memory anymore.\nConsider the tree we just built, above:\n\nnode 0\n/      \\\nnode 1    node 2\n\nThere's two ways Perl knows you're using a piece of memory: 1) it's\nmemory that belongs directly to a variable (i.e., is necessary to hold\nthat variable's value, or values in the case of a hash or array), or 2)\nit's a piece of memory that something holds a reference to.  In the\nabove code, Perl knows that the hash for node 0 (for board\n\"xxxxoooo\") is in use because something (namely, the variable $n0)\nholds a reference to it.  Now, even if you followed the above code with\nthis:\n\n$n1 = $n2 = 'whatever';\n\nto make your variables $n1 and $n2 stop holding references to the\nhashes for the two successors of node 0, Perl would still know that\nthose hashes are still in use, because node 0's successors array holds\na reference to those hashes.  And Perl knows that node 0 is still in\nuse because something still holds a reference to it.  Now, if you\nadded:\n\nmy $root = $n0;\n\nThis would change nothing -- there's just be two things holding a\nreference to the node 0 hash, which in turn holds a reference to the\nnode 1 and node 2 hashes.  And if you then added:\n\n$n0 = 'stuff';\n\nstill nothing would change, because something ($root) still holds a\nreference to the node 0 hash.  But once nothing holds a reference to\nthe node 0 hash, Perl will know it can destroy that hash (and reclaim\nthe memory for later use, say), and once it does that, nothing will\nhold a reference to the node 1 or the node 2 hashes, and those will be\ndestroyed too.\n\nBut consider if the node 1 and node 2 hashes each had an attribute\n\"parent\" (or \"predecessor\") that held a reference to node 0.  If your\nprogram stopped holding a reference to the node 0 hash, Perl could not\nthen say that nothing holds a reference to node 0 -- because node 1 and\nnode 2 still do.  So, the memory for nodes 0, 1, and 2 would never get\nreclaimed (until your program ended, at which point Perl destroys\neverything).  If your program grew and discarded lots of nodes in the\ngame tree, but didn't let Perl know it could reclaim their memory, your\nprogram could grow to use immense amounts of memory -- never a nice\nthing to have happen.  There's three ways around this:\n\n1) When you're finished with a node, delete the reference each of its\nchildren have to it (in this case, deleting $n1->{'parent'}, say).\nWhen you're finished with a whole tree, just go through the whole tree\nerasing links that children have to their children.\n\n2) Reconsider whether you really need to have each node hold a\nreference to its parent.  Just not having those links will avoid the\nwhole problem.\n\n3) use the WeakRef module with Perl 5.6 or later.  This allows you to\n\"weaken\" some references (like the references that node 1 and 2 could\nhold to their parent) so that they don't count when Perl goes asking\nwhether anything holds a reference to a given piece of memory.  This\nwonderful new module eliminates the headaches that can often crop up\nwith either of the two previous methods.\n\nIt so happens that our Alak program is simple enough that we don't need\nfor our nodes to have links to their parents, so the second solution is\nfine.  But in a more advanced program, the first or third solutions\nmight be unavoidable.\n\nRecursively Printing the Tree\nI don't like working blind -- if I have any kind of a complex data\nstructure in memory for a program I'm working on, the first thing I do\nis write something that can dump that structure to the screen so I can\nmake sure that what I think is in memory really is what's in memory.\nNow, I could just use the \"x\" pretty-printer command in Perl's\ninteractive debugger, or I could have the program use the\n\"Data::Dumper\" module.  But in this case, I think the output from those\nis rather too verbose.  Once we have trees with dozens of nodes in\nthem, we'll really want a dump of the tree to be as concise as\npossible, hopefully just one line per node.  What I'd like is something\nthat can print $n0 and its successors (see above) as something like:\n\nxxxxoooo  (O moved 9 to 7, 0 payoff)\nxxxxoooo  (X moved 4 to 6, 0 payoff)\nxxxxoooo  (X moved 5 to 6, 0 payoff)\n\nA subroutine to print a line for a given node, and then do that again\nfor each successor, would look something like:\n\nsub dumptree {\nmy $n = $[0]; # \"n\" is for node\nprint\n...something expressing $n'n content...\nforeach my $s (@{$n->{'successors'}}) {\n# \"s for successor\ndump($s);\n}\n}\n\nAnd we could just start that out with a call to \"dumptree($n0)\".\n\nSince this routine...\n\nFootnote: I first wrote this routine starting out with \"sub dump\n{\".  But when I tried actually calling \"dump($n0)\", Perl would dump\ncore!  Imagine my shock when I discovered that this is absolutely\nto be expected -- Perl provides a built-in function called \"dump\",\nthe purpose of which is to, yes, make Perl dump core.  Calling our\nroutine \"dumptree\" instead of \"dump\" neatly avoids that problem.\n\n...does its work (dumping the subtree at and under the given node) by\ncalling itself, it's recursive.  However, there's a special term for\nthis kind of recursion across a tree: traversal.  To traverse a tree\nmeans to do something to a node, and to traverse its children.  There's\ntwo prototypical ways to do this, depending on what happens when:\n\ntraversing X in pre-order:\n* do something to X\n* then traverse X's children\n\ntraversing X in post-order:\n* traverse X's children\n* then do something to X\n\nDumping the tree to the screen the way we want it happens to be a\nmatter of pre-order traversal, since the thing we do (print a\ndescription of the node) happens before we recurse into the successors.\n\nWhen we try writing the \"print\" statement for our above \"dumptree\", we\ncan get something like:\n\nsub dumptree {\nmy $n = $[0];\n\n# \"xxxxoooo  (O moved 9 to 7, 0 payoff)\"\nprint\n$n->{'board'}, \"  (\",\n($n->{'whoseturn'} eq 'o' ? 'X' : 'O'),\n# Infer who last moved from whose turn it is now.\n\" moved \", $n->{'lastmovefrom'},\n\" to \",    $n->{'lastmoveto'},\n\", \",      $n->{'lastmovepayoff'},\n\" payoff)\\n\",\n;\n\nforeach my $s (@{$n->{'successors'}}) {\ndumptree($s);\n}\n}\n\nIf we run this on $n0 from above, we get this:\n\nxxxxoooo  (O moved 9 to 7, 0 payoff)\nxxxxoooo  (X moved 4 to 6, 0 payoff)\nxxxxoooo  (X moved 5 to 6, 0 payoff)\n\nEach line on its own is fine, but we forget to allow for indenting, and\nwithout that we can't tell what's a child of what.  (Imagine if the\nfirst successor had successors of its own -- you wouldn't be able to\ntell if it were a child, or a sibling.)  To get indenting, we'll need\nto have the instances of the \"dumptree\" routine know how far down in\nthe tree they're being called, by passing a depth parameter between\nthem:\n\nsub dumptree {\nmy $n = $[0];\nmy $depth = $[1];\n$depth = 0 unless defined $depth;\nprint\n\"  \" x $depth,\n...stuff...\nforeach my $s (@{$n->{'successors'}}) {\ndumptree($s, $depth + 1);\n}\n}\n\nWhen we call \"dumptree($n0)\", $depth (from $[1]) is undefined, so\ngets set to 0, which translates into an indenting of no spaces.  But\nwhen \"dumptree\" invokes itself on $n0's children, those instances see\n$depth + 1 as their $[1], giving appropriate indenting.\n\nFootnote: Passing values around between different invocations of a\nrecursive routine, as shown, is a decent way to share the data.\nAnother way to share the data is by keeping it in a global\nvariable, like $Depth, initially set to 0.  Each time \"dumptree\"\nis about to recurse, it must \"++$Depth\", and when it's back, it\nmust \"--$Depth\".\n\nOr, if the reader is familiar with closures, consider this\napproach:\n\nsub dumptree {\n# A wrapper around calls to a recursive closure:\nmy $startnode = $[0];\nmy $depth = 0;\n# to be shared across calls to $recursor.\nmy $recursor;\n$recursor = sub {\nmy $n = $[0];\nprint \"  \" x $depth,\n...stuff...\n++$depth;\nforeach my $s (@{$n->{'successors'}}) {\n$recursor->($s);\n}\n--$depth;\n}\n$recursor->($startnode); # start recursing\nundef $recursor;\n}\n\nThe reader with an advanced understanding of Perl's reference-\ncount-based garbage collection is invited to consider why it is\ncurrently necessary to undef $recursor (or otherwise change its\nvalue) after all recursion is done.\n\nThe reader whose mind is perverse in other ways is invited to\nconsider how (or when!) passing a depth parameter around is\nunnecessary because of information that Perl's caller(N) function\nreports!\n\n[end footnote]\n\nGrowing the Tree\nOur \"dumptree\" routine works fine for the sample tree we've got, so\nnow we should get the program working on making its own trees, starting\nfrom a given board.\n\nIn \"Games::Alak\" (the CPAN-released version of Alak that uses\nessentially the same code that we're currently discussing the tree-\nrelated parts of), there is a routine called \"figuresuccessors\" that,\ngiven one childless node, will figure out all its possible successors.\nThat is, it looks at the current board, looks at every piece belonging\nto the player whose turn it is, and considers the effect of moving each\npiece every possible way -- notably, it figures out the immediate\npayoff, and if that move would end the game, it notes that by setting\nan \"endgame\" entry in that node's hash.  (That way, we know that that's\na node that can't have successors.)\n\nIn the code for \"Games::Alak\", \"figuresuccessors\" does all these\nthings, in a rather straightforward way.  I won't walk you through the\ndetails of the \"figuresuccessors\" code I've written, since the code\nhas nothing much to do with trees, and is all just implementation of\nthe Alak rules for what can move where, with what result.  Especially\ninterested readers can puzzle over that part of code in the source\nlisting in the archive from CPAN, but others can just assume that it\nworks as described above.\n\nBut consider that \"figuresuccessors\", regardless of its inner\nworkings, does not grow the tree; it only makes one set of successors\nfor one node at a time.  It has to be up to a different routine to call\n\"figuresuccessors\", and to keep applying it as needed, in order to\nmake a nice big tree that our game-playing program can base its\ndecisions on.\n\nNow, we could do this by just starting from one node, applying\n\"figuresuccessors\" to it, then applying \"figuresuccessors\" on all the\nresulting children, and so on:\n\nsub grow {  # Just a first attempt at this!\nmy $n = $[0];\nfiguresuccessors($n);\nunless\n@{$n->{'successors'}}\n# already has successors.\nor $n->{'endgame'}\n# can't have successors.\n}\nforeach my $s (@{$n->{'successors'}}) {\ngrow($s); # recurse\n}\n}\n\nIf you have a game tree for tic-tac-toe, and you grow it without\nlimitation (as above), you will soon enough have a fully \"solved\" tree,\nwhere every node that can have successors does, and all the leaves of\nthe tree are all the possible endgames (where, in each case, the board\nis filled).  But a game of Alak is different from tic-tac-toe, because\nit can, in theory, go on forever.  For example, the following sequence\nof moves is quite possible:\n\nxxxxoooo\nxxxxoooo\nxxxxoooo\nxxxxoooo (x moved back)\nxxxxoooo (o moved back)\n...repeat forever...\n\nSo if you tried using our above attempt at a \"grow\" routine, Perl would\nhappily start trying to construct an infinitely deep tree, containing\nan infinite number of nodes, consuming an infinite amount of memory,\nand requiring an infinite amount of time.  As the old saying goes: \"You\ncan't have everything -- where would you put it?\"  So we have to place\nlimits on how much we'll grow the tree.\n\nThere's more than one way to do this:\n\n1. We could grow the tree until we hit some limit on the number of\nnodes we'll allow in the tree.\n\n2. We could grow the tree until we hit some limit on the amount of time\nwe're willing to spend.\n\n3. Or we could grow the tree until it is fully fleshed out to a certain\ndepth.\n\nSince we already know to track depth (as we did in writing\n\"dumptree\"), we'll do it that way, the third way.  The implementation\nfor that third approach is also pretty straightforward:\n\n$Maxdepth = 3;\nsub grow {\nmy $n = $[0];\nmy $depth = $[1] || 0;\nfiguresuccessors($n)\nunless\n$depth >= $Maxdepth\nor @{$n->{'successors'}}\nor $n->{'endgame'}\n}\nforeach my $s (@{$n->{'successors'}}) {\ngrow($s, $depth + 1);\n}\n# If we're at $Maxdepth, then figuresuccessors\n#  didn't get called, so there's no successors\n#  to recurse under -- that's what stops recursion.\n}\n\nIf we start from a single node (whether it's a node for the starting\nboard \"xxxxoooo\", or for whatever board the computer is faced with),\nset $Maxdepth to 4, and apply \"grow\" to it, it will grow the tree to\ninclude several hundred nodes.\n\nFootnote: If at each move there are four pieces that can move, and\nthey can each move right or left, the \"branching factor\" of the\ntree is eight, giving a tree with 1 (depth 0) + 8 (depth 1) + 8\n2 + 8  3 + 8  4  = 4681 nodes in it.  But, in practice, not all\npieces can move in both directions (none of the x pieces in\n\"xxxxoooo\" can move left, for example), and there may be fewer\nthan four pieces, if some were lost.  For example, there are 801\nnodes in a tree of depth four starting from \"xxxxoooo\",\nsuggesting an average branching factor of about five (801  (1/4)\nis about 5.3), not eight.\n\nWhat we need to derive from that tree is the information about what are\nthe best moves for X.  The simplest way to consider the payoff of\ndifferent successors is to just average them -- but what we average\nisn't always their immediate payoffs (because that'd leave us using\nonly one generation of information), but the average payoff of their\nsuccessors, if any.  We can formalize this as:\n\nTo figure a node's average payoff:\nIf the node has successors:\nFigure each successor's average payoff.\nMy average payoff is the average of theirs.\nOtherwise:\nMy average payoff is my immediate payoff.\n\nSince this involves recursing into the successors before doing anything\nwith the current node, this will traverse the tree in post-order.\n\nWe could work that up as a routine of its own, and apply that to the\ntree after we've applied \"grow\" to it.  But since we'd never grow the\ntree without also figuring the average benefit, we might as well make\nthat figuring part of the \"grow\" routine itself:\n\n$Maxdepth = 3;\nsub grow {\nmy $n = $[0];\nmy $depth = $[1] || 0;\nfiguresuccessors($n);\nunless\n$depth >= $Maxdepth\nor @{$n->{'successors'}}\nor $n->{'endgame'}\n}\n\nif(@{$n->{'successors'}}) {\nmy $apayoffsum = 0;\nforeach my $s (@{$n->{'successors'}}) {\ngrow($s, $depth + 1);  # RECURSE\n$apayoffsum += $s->{'averagepayoff'};\n}\n$n->{'averagepayoff'}\n= $apayoffsum / @{$n->{'successors'}};\n} else {\n$n->{'averagepayoff'}\n= $n->{'lastmovepayoff'};\n}\n}\n\nSo, by time \"grow\" has applied to a node (wherever in the tree it is),\nit will have figured successors if possible (which, in turn, sets\n\"lastmovepayoff\" for each node it creates), and will have set\n\"averagebenefit\".\n\nBeyond this, all that's needed is to start the board out with a root\nnote of \"xxxxoooo\", and have the computer (X) take turns with the\nuser (O) until someone wins.  Whenever it's O's turn, \"Games::Alak\"\npresents a prompt to the user, letting him know the state of the\ncurrent board, and asking what move he selects.  When it's X's turn,\nthe computer grows the game tree as necessary (using just the \"grow\"\nroutine from above), then selects the move with the highest average\npayoff (or one of the highest, in case of a tie).\n\nIn either case, \"selecting\" a move means just setting that move's node\nas the new root of the program's game tree.  Its sibling nodes and\ntheir descendants (the boards that didn't get selected) and its parent\nnode will be erased from memory, since they will no longer be in use\n(as Perl can tell by the fact that nothing holds references to them\nanymore).\n\nThe interface code in \"Games::Alak\" (the code that prompts the user for\nhis move) actually supports quite a few options besides just moving --\nincluding dumping the game tree to a specified depth (using a slightly\nfancier version of \"dumptree\", above), resetting the game, changing\n$Maxdepth in the middle of the game, and quitting the game.  Like\n\"figuresuccessors\", it's a bit too long to print here, but interested\nusers are welcome to peruse (and freely modify) the code, as well as to\nenjoy just playing the game.\n\nNow, in practice, there's more to game trees than this: for games with\na larger branching factor than Alak has (which is most!), game trees of\ndepth four or larger would contain too many nodes to be manageable,\nmost of those nodes being strategically quite uninteresting for either\nplayer; dealing with game trees specifically is therefore a matter of\nrecognizing uninteresting contingencies and not bothering to grow the\ntree under them.\n\nFootnote: For example, to choose a straightforward case: if O has a\nchoice between moves that put him in immediate danger of X winning\nand moves that don't, then O won't ever choose the dangerous moves\n(and if he does, the computer will know enough to end the game), so\nthere's no point in growing the tree any further beneath those\nnodes.\n\nBut this sample implementation should illustrate the basics of how to\nbuild and manipulate a simple tree structure in memory.  And once\nyou've understood the basics of tree storage here, you should be ready\nto better understand the complexities and peculiarities of other\nsystems for creating, accessing, and changing trees, including\nTree::DAGNode, HTML::Element, XML::DOM, or related formalisms like\nXPath and XSL.\n\n[end body of article]\n\n[Author Credit]\nSean M. Burke (\"sburke@cpan.org\") is a tree-dwelling hominid.\n\nReferences\nDewdney, A[lexander] K[eewatin].  1984.  Planiverse: Computer Contact\nwith a Two-Dimensional World.  Poseidon Press, New York.\n\nKnuth, Donald Ervin.  1997.  Art of Computer Programming, Volume 1,\nThird Edition: Fundamental Algorithms.  Addison-Wesley,  Reading, MA.\n\nWirth, Niklaus.  1976.  Algorithms + Data Structures = Programs\nPrentice-Hall, Englewood Cliffs, NJ.\n\nWorth, Stan and Allman Sheldon.  Circa 1967.  George of the Jungle\ntheme.  [music by Jay Ward.]\n\nWirth's classic, currently and lamentably out of print, has a good\nsection on trees.  I find it clearer than Knuth's (if not quite as\nencyclopedic), probably because Wirth's example code is in a block-\nstructured high-level language (basically Pascal), instead of in\nassembler (MIX).  I believe the book was re-issued in the 1980s under\nthe titles Algorithms and Data Structures and, in a German edition,\nAlgorithmen und Datenstrukturen.  Cheap copies of these editions should\nbe available through used book services such as \"abebooks.com\".\n\nWorth's classic, however, is available on the soundtrack to the 1997\nGeorge of the Jungle movie, as performed by The Presidents of the\nUnited States of America.\n",
            "subsections": []
        },
        "BACK": {
            "content": "Return to the HTML::Tree docs.\n\nperl v5.28.1                      2019-01-13       HTML::Tree::AboutTrees(3pm)",
            "subsections": []
        }
    },
    "summary": "HTML::Tree::AboutTrees -- article on tree-shaped data structures in Perl",
    "flags": [],
    "examples": [],
    "see_also": []
}