Email and HTML Parser Library

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Folks:

I admit, I haven't searched for this anywhere yet, but I thought I'd ask
for opinions first.

I'm looking to parse an email.  Some emails are HTML, some are not.

What I want to do with an email is:

    1. Split the headers from the body
    2. Remove MIME attachments that aren't txt or html from the body
    3. Grab all the HREF urls in the body as well as image SRCes (fully
       qualified, so if there is a "<base href=" in the beginning the
       library will note this and fully qualify HREFs and SRCs (as well as
       anything else); basically I want a list (or the ability to build a
       list) that looks like this:

       href http://www.purplecow.com/hhs/
       img  http://purplecow.com/gfx/icons/new.gif

Are there any good libraries of functions that would aid me in this effort?
The idea is to store the headers in the DB, store the body in another
table, and store hrefs and image URLs in another table.  The URLs will be
used to see if there are redirects to somewhere else and make a "parent"
association.

Any thoughts, pointers, urls or code would be appreciated!

Peter
---------------------------------------------------------------------------
Peter Beckman            Systems Engineer, Fairfax Cable Access Corporation
beckman@purplecow.com                             http://www.purplecow.com/
---------------------------------------------------------------------------


-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [PHP Users]     [Postgresql Discussion]     [Kernel Newbies]     [Postgresql]     [Yosemite News]

  Powered by Linux