Re: strip tags but preserve title attributes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've had quite some luck using the html2text class by Jon Abernathy

   http://www.chuggnutt.com/html2text.php

It's targetted to php 4, and rather old code - but it does the job for me.
Where the 'job for me' is converting html to text for when I'm sending out
emails in HTML format and want to offer the proper plain text alternative.

To be honest, I haven't checked how it handles title/alt attributes on
images - but I'm confident that it does it nicely, and if it doesn't that
you can add it yourself.

And if that doesn't suit your needs - you might want to take a look at this:

    http://sourceforge.net/projects/simplehtmldom/

Regards,
Wouter

2009/12/15 Andrew Ballard <aballard@xxxxxxxxx>

> On Mon, Dec 14, 2009 at 6:43 PM, Ashley Sheridan
> <ash@xxxxxxxxxxxxxxxxxxxx> wrote:
> > I'm looking for a way to strip HTML tags out of some text content
> > (sourced from a web page) to leave just the text which I'll be running
> > some basic analysis on. The thing is, I want to preserve text that is in
> > alt and title attributes. I can't use any DOM functions, as I can't
> > guarantee that the content will be valid XHTML, although it should be
> > valid HTML.
> >
> > I'm happy doing this with string functions and regular expressions, but
> > I was wondering if something for this already existed? The server I plan
> > on putting this on does not have access to the shell (although it is a
> > Linux server) so I won't be able to have Lynx or Elinks parse the
> > content for me either :(
> >
> > Thanks,
> > Ash
> > http://www.ashleysheridan.co.uk
> >
>
> Are you sure you can't use DOM? It has a function specifically for
> parsing HTML that "does not have to be well-formed to load."
>
> http://www.php.net/manual/en/domdocument.loadhtml.php
>
>
> If that doesn't work, you might look at Zend_Filter_StripTags in ZF. I
> don't know if it will do exactly what you're after, but it seems to be
> more flexible than the strip_tags function built into PHP.
>
> Andrew
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>


-- 
http://www.interpotential.com
http://www.ilikealot.com

Phone: +4520371433

[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux