function readyForDOM_report($originalReportAsText) { return str_replace ('<th', '<th class="transportTH"', $originalReportAsText); } $dom = new DOMDocument(); $dom->loadHTML(readyForDOM_report($str)); $tables = $dom->getElementsByTagName("table"); $rows = $tables->item(0)->getElementsByTagName('tr'); foreach($rows as $row){ foreach($row->childNodes as $node) // check $node for having a classname 'transportTH'. } the only problem i foresee is <th>s in your reports already having a class="something" set, which could mess it up. you'd need to check that. but in that case you can always pump the original $str to the DOM, and use multiple $k's from foreach ($arr as $k=>$v) to get to the corresponding node, and have the original class name. On Thu, Mar 11, 2010 at 9:52 PM, Andy Theuninck <gohanman@xxxxxxxxx> wrote: > I could could, but that would kind of defeat the point of the project > (I'm trying to capture a bunch of existing HTML reports via output > buffering and transform the tables into proper XLS. Tweaking every > single report is exactly what I'm trying to avoid). > > On Thu, Mar 11, 2010 at 2:45 PM, Rene Veerman <rene7705@xxxxxxxxx> wrote: >> hmm lame bug... but you can add a classname to the <th>s and check for that?.. >> >> On Thu, Mar 11, 2010 at 9:34 PM, Andy Theuninck <gohanman@xxxxxxxxx> wrote: >>> I'm trying to parse a string containing an HTML table using the >>> builtin DOM classes and running into an odd problem. >>> >>> Here's what I'm doing: >>> $dom = new DOMDocument(); >>> $dom->loadHTML($str); >>> $tables = $dom->getElementsByTagName("table"); >>> $rows = $tables->item(0)->getElementsByTagName('tr'); >>> foreach($rows as $row){ >>> foreach($row->childNodes as $node) >>> // stuff >>> } >>> >>> This gives me the row elements in order and access to their contents. >>> The weird part is $node always appears to be a td tag - even when it's >>> a th tag in the original string (DOMElement::tagName is always "td" >>> (as well as DOMNode::nodeName and DOMNode::localName)). The th tags >>> definitely aren't being omitted; I still get nodes with their >>> contents, just with the wrong tag name. >>> >>> Is there any way to override this behavior so that I can distinguish >>> between td tags and th tags? >>> >>> -- >>> PHP General Mailing List (http://www.php.net/) >>> To unsubscribe, visit: http://www.php.net/unsub.php >>> >>> >> > -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php