M5 wrote: > No, it's not a very good solution. Striptags will leave everything > within <head>, <style> and <script> (in the body or out). Comments are > also included. > > I know it's possible to use non reg-ex strpos/substr to extra everything > within <body>, but as another poster correctly said, this assumes a > consistent HTML document (which there is not). > > I realize now that such a regex would be rather sophisticated, but I > thought surely it must exist, since text-scrapping the readable content > of a web page must not be rare. Said it before, but low-tech solution is to use program "lynx" with the -dump argument and capture the output back to PHP. I'm assuming you are on Linux or OSX I guess as I've not heard of using lynx on windows..... There are loads of command line options to control the way lynx displays the output so you have a very fine grain of control here. Col -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php