Am 03.04.2010 16:29, schrieb tedd: > Hi gang: > > Here's the problem. > > I have 184 HTML pages in a directory and each page contain a question. > The question is noted in the HTML DOM like so: > > <p class="question"> > Who is Roger Rabbit? > </p> > > My question is -- how can I extract the string "Who is Roger Rabbit?" > from each page using php? You see, I want to store the questions in a > database without having to re-type, or cut/paste, each one. > > Now, I can extract each question by using javascript -- > > document.getElementById("question").innerHTML; > > -- and stepping through each page, but I don't want to use javascript > for this. > > I have not found/created a working example of this using PHP. I tried > using PHP's getElementByID(), but that requires the target file to be > valid xml and the string to be contained within an ID and not a class. > These pages do not support either requirement. > > Additionally, I realize that I can load the files and parse out what is > between the <p> tags, but I was hoping for a "GetElementByClass" way to > do this. > > So, is there one? > > Thanks, > > tedd Why don't you just use REGEX? I don't know any possibility to easily process contents which are not valid XML/XHTML just because there's no library to load such stuff (but put me in right there). I'm not an expert of REGEX, but I think the following would do it: /\<p\s*class\=\"question\"\s*\>(.*)\<\/p\> (my first contribute here, I beg your pardon if something went wrong) Regards, Valentin Dreismann -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php