On 7/8/10, Richard Quadling wrote: > On 8 July 2010 16:15, Gary wrote: >> Okay. At least one of the problems with this so called HTML seems to >> be that the body tag looks like >> <BODY vlink=#ffffff ...> >> and xml_parse complains that "> required" on that line (i.e. it is >> claiming it can't find the end of the tag!). >> >> I'm guessing that those attributes "must" be quoted in XML and >> "should" be in HTML (but patently aren't)? Is there any way to get >> xml_parse to ignore that? My element_handler functions never even get >> a chance to see that line. >> >> Regex to insert quotes or remove the attributes entirely, perhaps? >> *gulp* I hope there's a better way than that. > > So. Essentially, you want to parse some plain text which may or may > not be well formed XML. No. I don't *want* to.... And it isn't plain text, it's just sh*t html (no doctype, missing closing tags in some cases, etc. It's an absolute mess). Browsers are pretty good at handling it. XML parsers... less so. > How badly formed is the file going to be? It's not a file. It comes from an embedded web server on a device. I could ask them to change it. I can hear the laughter already. > If it is things like missing ", then this could be managed with regex. > Essentially you are going to have to do the clean up that Tidy could > do for you. Yeah :( -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php