I've had quite some luck using the html2text class by Jon Abernathy http://www.chuggnutt.com/html2text.php It's targetted to php 4, and rather old code - but it does the job for me. Where the 'job for me' is converting html to text for when I'm sending out emails in HTML format and want to offer the proper plain text alternative. To be honest, I haven't checked how it handles title/alt attributes on images - but I'm confident that it does it nicely, and if it doesn't that you can add it yourself. And if that doesn't suit your needs - you might want to take a look at this: http://sourceforge.net/projects/simplehtmldom/ Regards, Wouter 2009/12/15 Andrew Ballard <aballard@xxxxxxxxx> > On Mon, Dec 14, 2009 at 6:43 PM, Ashley Sheridan > <ash@xxxxxxxxxxxxxxxxxxxx> wrote: > > I'm looking for a way to strip HTML tags out of some text content > > (sourced from a web page) to leave just the text which I'll be running > > some basic analysis on. The thing is, I want to preserve text that is in > > alt and title attributes. I can't use any DOM functions, as I can't > > guarantee that the content will be valid XHTML, although it should be > > valid HTML. > > > > I'm happy doing this with string functions and regular expressions, but > > I was wondering if something for this already existed? The server I plan > > on putting this on does not have access to the shell (although it is a > > Linux server) so I won't be able to have Lynx or Elinks parse the > > content for me either :( > > > > Thanks, > > Ash > > http://www.ashleysheridan.co.uk > > > > Are you sure you can't use DOM? It has a function specifically for > parsing HTML that "does not have to be well-formed to load." > > http://www.php.net/manual/en/domdocument.loadhtml.php > > > If that doesn't work, you might look at Zend_Filter_StripTags in ZF. I > don't know if it will do exactly what you're after, but it seems to be > more flexible than the strip_tags function built into PHP. > > Andrew > > -- > PHP General Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php > > -- http://www.interpotential.com http://www.ilikealot.com Phone: +4520371433