Extract printable text from web page using preg_match

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I am trying to write a regex function to extract the readable (visible, screen-rendered) portion of any web page. Specifically, I only want the text between the <body> tags, excluding any <script> or <style> tags within the document, also excluding comments. Has anyone here seen such a regex? Is it possible to do in one expression?

...Rene




[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux