On Tue, 2009-08-18 at 01:37 -0700, leledumbo wrote: > Usually, a website gives preview of its articles by extracting some of the > first characters. This is easy if the article is a pure text, but what if > it's a HTML text? For instance, if I have the full text: > > <p> > bla bla bla > <ul> > <li>item 1</li> > <li>item 2</li> > <li>item 3</li> > </ul> > </p> > > and I take the first 40 characters, it would result in: > > <p> > bla bla bla > <ul> > <li>item > > As you can see, the tags are incomplete and it might break other texts below > it (I mean, other than this preview). I need a way to solve this problem. > > -- > View this message in context: http://www.nabble.com/HTML-text-extraction-tp25020687p25020687.html > Sent from the PHP - General mailing list archive at Nabble.com. > > You could do a couple of things: * Extract all the content and use strip_tags() to remove all the HTML markup. In the example you gave it might look a bit odd if the content suggests it was originally a list. * Access the extracted content through the DOM, and grab the textual content you need using node values. That way, you can limit it to a specific character count of content, and with a bit of work, you can preserve the original markup tags too Thanks, Ash http://www.ashleysheridan.co.uk -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php