Re: GetElementByClass?

Ashley Sheridan <ash@xxxxxxxxxxxxxxxxxxxx> · Sat, 03 Apr 2010 15:58:44 +0100

On Sat, 2010-04-03 at 10:29 -0400, tedd wrote:

> Hi gang:
> 
> Here's the problem.
> 
> I have 184 HTML pages in a directory and each page contain a 
> question. The question is noted in the HTML DOM like so:
> 
> <p class="question">
>    Who is Roger Rabbit?
> </p>
> 
> My question is -- how can I extract the string "Who is Roger Rabbit?" 
> from each page using php? You see, I want to store the questions in a 
> database without having to re-type, or cut/paste, each one.
> 
> Now, I can extract each question by using javascript --
> 
> document.getElementById("question").innerHTML;
> 
> -- and stepping through each page, but I don't want to use javascript for this.
> 
> I have not found/created a working example of this using PHP. I tried 
> using PHP's getElementByID(), but that requires the target file to be 
> valid xml and the string to be contained within an ID and not a 
> class. These pages do not support either requirement.
> 
> Additionally, I realize that I can load the files and parse out what 
> is between the <p> tags, but I was hoping for a "GetElementByClass" 
> way to do this.
> 
> So, is there one?
> 
> Thanks,
> 
> tedd
> -- 
> -------
> http://sperling.com  http://ancientstones.com  http://earthstones.com
> 

I don't think there is a getElementsByClass function. HTML5 is proposing
one, but that will most likely be implemented in Javascript before PHP
Dom. There is a way to tidy up the HTML to make it XHTML, but I'm not
sure what it is. If you know roughly where in the document the HTML
snippet is you can use XPath to grab it.

Failing that, what about a regex? It shouldn't be too hard to write a
regex to match your example above.

Thanks,
Ash
http://www.ashleysheridan.co.uk