Hi,
You can use fgetss() or strip_tags() to take the tags off and
html_entity_decode() to transform the HTML entities.
I don't understand what you mean by putting it into paragraphs. Are you
talking about rewriting the HTML, or something else?
- Alex "Sunstorm"
On Tue, 28 Mar 2006 15:08:31 +0100, ""Ministério Público""
<arquivomortovirtual@xxxxxxxxx> wrote:
Hi guys I`m trying to retrieve a html page from an url, wich I already
done
with the following script:
*
$document* *=* implode*('',* file*('http://www.mysite.net/'));*
**
*Then I need to extract the html tags from it wich I did with the
following
script:*
*
$search = array ('@<script[^>]*?>.*?</script>@si', // Strip out
javascript
'@<[\/\!]*?[^<>]*?>@si', // Strip out HTML tags
'@([\r\n])[\s]+@', // Strip out white space
'@&(quot|#34);@i', // Replace HTML entities
'@&(amp|#38);@i',
'@&(lt|#60);@i',
'@&(gt|#62);@i',
'@&(nbsp|#160);@i',
'@&(iexcl|#161);@i',
'@&(cent|#162);@i',
'@&(pound|#163);@i',
'@&(copy|#169);@i',
'@&#(\d+);@e'); // evaluate as php
$replace = array ('',
'',
'\1',
'"',
'&',
'<',
'>',
' ',
chr(161),
chr(162),
chr(163),
chr(169),
'chr(\1)');
$text = preg_replace($search, $replace, $document);
My Problem is that I still get a mumbled text wich I'd like to put a
paragraph after every attribute, like after title should be a paragrah,
after the first block of text should be another, and so on. Also the
attributes for the html tags as still showing so I'd also like to remove
these from the results. Thanks for any idea.
Rodrigo
*
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php