Re: HTML text extraction

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

on 08/18/2009 05:37 AM leledumbo said the following:
> Usually, a website gives preview of its articles by extracting some of the
> first characters. This is easy if the article is a pure text, but what if
> it's a HTML text? For instance, if I have the full text:
> 
> <p>
>   bla bla bla
>   <ul>
>     <li>item 1</li>
>     <li>item 2</li>
>     <li>item 3</li>
>   </ul>
> </p>
> 
> and I take the first 40 characters, it would result in:
> 
> <p>
>   bla bla bla
>   <ul>
>     <li>item
> 
> As you can see, the tags are incomplete and it might break other texts below
> it (I mean, other than this preview). I need a way to solve this problem.

You may want to try these HTML parser classes. They can parse (and even
validate) HTML and return an array of tag or data elements. You can use
it to pick the first tags and data you. Then you you the RewriteElement
function to regenerate the HTML.

http://www.phpclasses.org/secure-html-filter

-- 

Regards,
Manuel Lemos

Find and post PHP jobs
http://www.phpclasses.org/jobs/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux