Re: Re: html analyzer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2010-05-19 at 13:24 -0400, Bill Guion wrote:

> At 12:30 AM +0200 5/19/10, Rene Veerman wrote:
> 
> >Hi.
> >
> >I'm trying to build a html analyzer that looks at natural words in html text.
> >
> >I'd like to build a routine that walks through the HTML character by
> >character, but i'm not sure on how to properly walk through escaped "
> >and ' characters in javascript or other embedded languages. Skipping
> >the first " and ' is no problem, but after that, the escaped " and ',
> >they can get difficult imo.
> >
> >If you have any ideas on this i'd like to hear 'm..
> >
> >--
> >---------------------------------
> >Greetings from Rene7705,
> >
> >My free open source webcomponents:
> >   http://code.google.com/u/rene7705/
> >   http://mediabeez.ws/downloads (and demos)
> >
> >http://www.facebook.com/rene7705
> >---------------------------------
> 
> Renee,
> 
> I agree with the previous post - what you want to do is non-trivial. 
> However, to address your question: one approach is to create a single 
> quote flag (sqf) and a double quote flag (dqf). When you encounter 
> the first quote, set that flag. When you encounter the second quote 
> of the same type, clear the flag. At the end, both flags should be 
> clear, or the html is mal-formed. You can also get more sophisticated 
> and verify that you do not encounter a single, double, single 
> sequence, or a double, single, double sequence. That gets more 
> involved by remembering which quote was first, second, and third - 
> third should be same as second, for example.
> 
>       -----===== Bill =====-----
> -- 
> 
> Don't find fault. Find a remedy. - Henry Ford
>    
> 
> 


It would have to be a lot more complicated than that, consider:

print "document.write('<a href=\"#\" onmouseover=
\"doSomething(\'argument\')\">link</a>')";

It's ugly, but potentially possible. I've seen Javascript being used to
write Javascript before because it required less (albeit uglier) code
than using cross-browser code to add event handlers.

The parser could though maybe split off strings it finds within
Javascript like this and parse that with the same function. It could
potentially then call itself recursively each time it encounters a
string.

Thanks,
Ash
http://www.ashleysheridan.co.uk



[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux