Re: Good XML Parser

"David Otton" <phpmail@xxxxxxxxxxxxxxxxxxxxxxx> · Mon, 12 May 2008 12:23:41 +0000

2008/5/12 Waynn Lue <waynnlue@xxxxxxxxx>:
> So if I'm looking to parse certain attributes out of an XML tree, if I
> use SAX, it seems that I would need to keep track of state internally.
>  E.g., if I have a tree like
>
> <head>
>  <a>
>   <b></b>
>  </a>
>  <a>
>    <b></b>
>  </a>
> </head>
>
> and say I'm interested in all that's between <b> underneath any <a>,
> I'd need to have a state machine that looked for an <a> followed by a
> <b>.  If I'm doing that, though, it seems like I should just start
> using a DOM parser instead?

Yeah, I think you've got it nailed, although your example is simple
enough (you're only holding one state value - "am I a child of <a>?")
that I'd probably still reflexively reach for the lightweight
solution). I use SAX for lightweight hacks, one step up from regexes -
I know the information I want is between <tag> and </tag>, and I don't
care about the rest of the document. The more I need to navigate the
document, the more likely I am to use DOM. I could build my own data
structures on top of a SAX parser, but why bother reinventing the
wheel? Of course, you have to factor document size into that - parsing
a big XML document into a tree can be slow.

You might also want to explore XPath
(http://uk.php.net/manual/en/function.simplexml-element-xpath.php
http://uk.php.net/manual/en/class.domxpath.php)... XPath is to XML as
Regexes are to text files. There's a good chance you'll be able to
roll all your parsing up into a couple of XPath queries.

I probably should have added that simple parsers come in two flavours
- Push Parsers and Pull Parsers. I tend to think (lazily) of Push and
Pull as variations on SAX, but strictly speaking they are different.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php