Re: Email and HTML Parser Library

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Have you considered using the IMAP extension? That would solve pretty
much all your problems with regards to "interpreting" the contents of a
message. It's a bit slow, though.

As for searching the hrefs and imgs, you can easily get away with a
couple of regular expressions.

Hope this helps.


Marco

-- 
------------
php|architect - The magazine for PHP Professionals
The first monthly worldwide magazine dedicated to PHP programmers
Check us out on the web at http://www.phparch.com

On Sat, 2002-11-09 at 15:57, Peter Beckman wrote:
> Hey Folks:
> 
> I admit, I haven't searched for this anywhere yet, but I thought I'd ask
> for opinions first.
> 
> I'm looking to parse an email.  Some emails are HTML, some are not.
> 
> What I want to do with an email is:
> 
>     1. Split the headers from the body
>     2. Remove MIME attachments that aren't txt or html from the body
>     3. Grab all the HREF urls in the body as well as image SRCes (fully
>        qualified, so if there is a "<base href=" in the beginning the
>        library will note this and fully qualify HREFs and SRCs (as well as
>        anything else); basically I want a list (or the ability to build a
>        list) that looks like this:
> 
>        href http://www.purplecow.com/hhs/
>        img  http://purplecow.com/gfx/icons/new.gif
> 
> Are there any good libraries of functions that would aid me in this effort?
> The idea is to store the headers in the DB, store the body in another
> table, and store hrefs and image URLs in another table.  The URLs will be
> used to see if there are redirects to somewhere else and make a "parent"
> association.
> 
> Any thoughts, pointers, urls or code would be appreciated!
> 
> Peter
> ---------------------------------------------------------------------------
> Peter Beckman            Systems Engineer, Fairfax Cable Access Corporation
> beckman@purplecow.com                             http://www.purplecow.com/
> ---------------------------------------------------------------------------
> 
> 
> -- 
> PHP Database Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
> 



-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [PHP Users]     [Postgresql Discussion]     [Kernel Newbies]     [Postgresql]     [Yosemite News]

  Powered by Linux