At 3:58 PM +0100 4/3/10, Ashley Sheridan wrote:
I don't think there is a getElementsByClass function. HTML5 is
proposing one, but that will most likely be implemented in
Javascript before PHP Dom. There is a way to tidy up the HTML to
make it XHTML, but I'm not sure what it is. If you know roughly
where in the document the HTML snippet is you can use XPath to grab
it.
Failing that, what about a regex? It shouldn't be too hard to write
a regex to match your example above.
Thanks,
Ash
Ash:
I don't have a problem solving the problem the long way, which is to:
1. Load the file;
2. Parse between the markers;
3. Strip tags and replace extra white space.
4. Save to the db.
In fact, here's the code I used to solve the problem:
//--------
$filesize = filesize($filename);
$file = fopen( $filename, "r" );
$text = fread( $file, $filesize );
fclose( $file );
$marker1 = "<p class=\"question\">";
$marker2 = "</p>";
$first = strpos($text, $marker1)+20;
$last = strpos($text, $marker2);
$len = $last - $first;
$text = substr($text, $first , $len);
$text = strip_tags($text);
$space = array(' ', "\t", "\n", "\r", "\x0B", "\x0C");
$words = array();
$all_words = explode(' ', $text);
{
$line = str_replace($space, '', $line);
if (strlen($line) > 0)
{
$words[] = $line;
}
}
$text = implode(' ',$words);
$text = htmlspecialchars($text);
//---------
I was just exploring PHP's getElement thing and wasn't having much
luck with it.
Cheers,
tedd
--
-------
http://sperling.com http://ancientstones.com http://earthstones.com
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php