Gotcha, wasn't thinking straight. Turns out it doesn't really have to be a legal-HTML attribute anyway, so I can just do: str_replace('<th','<th fakeattr="blah" ',$str) On Thu, Mar 11, 2010 at 3:01 PM, Rene Veerman <rene7705@xxxxxxxxx> wrote: > So in other words; it's the library that you fix with wrapper > functions, not the reports (outside the scope of using the library). > > On Thu, Mar 11, 2010 at 9:59 PM, Rene Veerman <rene7705@xxxxxxxxx> wrote: >> function readyForDOM_report($originalReportAsText) { >> return str_replace ('<th', '<th class="transportTH"', $originalReportAsText); >> } >> >> $dom = new DOMDocument(); >> $dom->loadHTML(readyForDOM_report($str)); >> $tables = $dom->getElementsByTagName("table"); >> $rows = $tables->item(0)->getElementsByTagName('tr'); >> foreach($rows as $row){ >> foreach($row->childNodes as $node) >> // check $node for having a classname 'transportTH'. >> } >> >> the only problem i foresee is <th>s in your reports already having a >> class="something" set, which could mess it up. you'd need to check >> that. but in that case you can always pump the original $str to the >> DOM, and use multiple $k's from foreach ($arr as $k=>$v) to get to the >> corresponding node, and have the original class name. >> >> >> >> >> >> On Thu, Mar 11, 2010 at 9:52 PM, Andy Theuninck <gohanman@xxxxxxxxx> wrote: >>> I could could, but that would kind of defeat the point of the project >>> (I'm trying to capture a bunch of existing HTML reports via output >>> buffering and transform the tables into proper XLS. Tweaking every >>> single report is exactly what I'm trying to avoid). >>> >>> On Thu, Mar 11, 2010 at 2:45 PM, Rene Veerman <rene7705@xxxxxxxxx> wrote: >>>> hmm lame bug... but you can add a classname to the <th>s and check for that?.. >>>> >>>> On Thu, Mar 11, 2010 at 9:34 PM, Andy Theuninck <gohanman@xxxxxxxxx> wrote: >>>>> I'm trying to parse a string containing an HTML table using the >>>>> builtin DOM classes and running into an odd problem. >>>>> >>>>> Here's what I'm doing: >>>>> $dom = new DOMDocument(); >>>>> $dom->loadHTML($str); >>>>> $tables = $dom->getElementsByTagName("table"); >>>>> $rows = $tables->item(0)->getElementsByTagName('tr'); >>>>> foreach($rows as $row){ >>>>> foreach($row->childNodes as $node) >>>>> // stuff >>>>> } >>>>> >>>>> This gives me the row elements in order and access to their contents. >>>>> The weird part is $node always appears to be a td tag - even when it's >>>>> a th tag in the original string (DOMElement::tagName is always "td" >>>>> (as well as DOMNode::nodeName and DOMNode::localName)). The th tags >>>>> definitely aren't being omitted; I still get nodes with their >>>>> contents, just with the wrong tag name. >>>>> >>>>> Is there any way to override this behavior so that I can distinguish >>>>> between td tags and th tags? >>>>> >>>>> -- >>>>> PHP General Mailing List (http://www.php.net/) >>>>> To unsubscribe, visit: http://www.php.net/unsub.php >>>>> >>>>> >>>> >>> >> > -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php