Re: html analyzer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

on 05/18/2010 07:30 PM Rene Veerman said the following:
> Hi.
> 
> I'm trying to build a html analyzer that looks at natural words in html text.
> 
> I'd like to build a routine that walks through the HTML character by
> character, but i'm not sure on how to properly walk through escaped "
> and ' characters in javascript or other embedded languages. Skipping
> the first " and ' is no problem, but after that, the escaped " and ',
> they can get difficult imo.
> 
> If you have any ideas on this i'd like to hear 'm..

Better try something that is already done. HTML parsing is not that
trivial. If the HTML you are parsing is malformed, things get worse.

You may want to try this HTML parser package. It can parse HTML, CSS,
DTD, etc.. in pure PHP. No special extensions required. It can tolerate
malformed HTML and even filter insecure HTML and CSS that may contain
dangerous Javascript. Actually it was done mainly for that purpose.

http://www.phpclasses.org/secure-html-filter


-- 

Regards,
Manuel Lemos

Find and post PHP jobs
http://www.phpclasses.org/jobs/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux