Ashley Sheridan wrote:
I'm looking for a way to strip HTML tags out of some text content (sourced from a web page) to leave just the text which I'll be running some basic analysis on. The thing is, I want to preserve text that is in alt and title attributes. I can't use any DOM functions, as I can't guarantee that the content will be valid XHTML, although it should be valid HTML. I'm happy doing this with string functions and regular expressions, but I was wondering if something for this already existed? The server I plan on putting this on does not have access to the shell (although it is a Linux server) so I won't be able to have Lynx or Elinks parse the content for me either :( Thanks, Ash http://www.ashleysheridan.co.uk
Sounds easy with a simple regex expression, certainly easier than twisting a class or DOM function to do the job.
How do you want to retain the text that is in the alt and title attributes? What form do you want it in? e.g., <img xxxx alt="foo">
-- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php