strip tags but preserve title attributes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm looking for a way to strip HTML tags out of some text content
(sourced from a web page) to leave just the text which I'll be running
some basic analysis on. The thing is, I want to preserve text that is in
alt and title attributes. I can't use any DOM functions, as I can't
guarantee that the content will be valid XHTML, although it should be
valid HTML.

I'm happy doing this with string functions and regular expressions, but
I was wondering if something for this already existed? The server I plan
on putting this on does not have access to the shell (although it is a
Linux server) so I won't be able to have Lynx or Elinks parse the
content for me either :(

Thanks,
Ash
http://www.ashleysheridan.co.uk



[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux