On Sat, 2010-04-03 at 17:03 +0200, dispy wrote: > Am 03.04.2010 16:29, schrieb tedd: > > Hi gang: > > > > Here's the problem. > > > > I have 184 HTML pages in a directory and each page contain a question. > > The question is noted in the HTML DOM like so: > > > > <p class="question"> > > Who is Roger Rabbit? > > </p> > > > > My question is -- how can I extract the string "Who is Roger Rabbit?" > > from each page using php? You see, I want to store the questions in a > > database without having to re-type, or cut/paste, each one. > > > > Now, I can extract each question by using javascript -- > > > > document.getElementById("question").innerHTML; > > > > -- and stepping through each page, but I don't want to use javascript > > for this. > > > > I have not found/created a working example of this using PHP. I tried > > using PHP's getElementByID(), but that requires the target file to be > > valid xml and the string to be contained within an ID and not a class. > > These pages do not support either requirement. > > > > Additionally, I realize that I can load the files and parse out what is > > between the <p> tags, but I was hoping for a "GetElementByClass" way to > > do this. > > > > So, is there one? > > > > Thanks, > > > > tedd > > Why don't you just use REGEX? I don't know any possibility to easily > process contents which are not valid XML/XHTML just because there's no > library to load such stuff (but put me in right there). > > I'm not an expert of REGEX, but I think the following would do it: > /\<p\s*class\=\"question\"\s*\>(.*)\<\/p\> > > > (my first contribute here, I beg your pardon if something went wrong) > > Regards, > > Valentin Dreismann > The . won't match new line characters, so you'll have to add those in too. Thanks, Ash http://www.ashleysheridan.co.uk