On Sat, 2010-04-03 at 10:29 -0400, tedd wrote: > Hi gang: > > Here's the problem. > > I have 184 HTML pages in a directory and each page contain a > question. The question is noted in the HTML DOM like so: > > <p class="question"> > Who is Roger Rabbit? > </p> > > My question is -- how can I extract the string "Who is Roger Rabbit?" > from each page using php? You see, I want to store the questions in a > database without having to re-type, or cut/paste, each one. > > Now, I can extract each question by using javascript -- > > document.getElementById("question").innerHTML; > > -- and stepping through each page, but I don't want to use javascript for this. > > I have not found/created a working example of this using PHP. I tried > using PHP's getElementByID(), but that requires the target file to be > valid xml and the string to be contained within an ID and not a > class. These pages do not support either requirement. > > Additionally, I realize that I can load the files and parse out what > is between the <p> tags, but I was hoping for a "GetElementByClass" > way to do this. > > So, is there one? > > Thanks, > > tedd > -- > ------- > http://sperling.com http://ancientstones.com http://earthstones.com > I don't think there is a getElementsByClass function. HTML5 is proposing one, but that will most likely be implemented in Javascript before PHP Dom. There is a way to tidy up the HTML to make it XHTML, but I'm not sure what it is. If you know roughly where in the document the HTML snippet is you can use XPath to grab it. Failing that, what about a regex? It shouldn't be too hard to write a regex to match your example above. Thanks, Ash http://www.ashleysheridan.co.uk