Ahmed Abdel-Aliem wrote: > Doesn anyone plz knows a good tutorial for parsing html files ? > i have a html page and i want to parse information from it to insert > it into mysql. > i have a good experience in php, but i didn't write a parser before. > can anyone help plz ? TidyHTML is supposed to be good at that. Never actually tried it, but John Coggeshall's presentation a few months ago at the Chicago PHP User Group meeting was pretty compelling. If you only need a few small bits of information from web pages whose format doesn't change often, you can maybe get it done really fast and easy with http://php.net/explode. I've scraped a lot of stuff that way myself. You simply have to search the HTML for a distinctive tag that is unlikely to change often and is shortly before the content you want. Then use http://php.net/explode with that tag. For example, on a site with calendar events, you might use: <?php $file = file('http://example.com/'); $html = implode('', $file); $parts = explode('<td class="event_date"', $html); while (list(, $event) = each($parts)){ list($date, $speaker, $description) = explode('</td>', $event); //Prepend <td because we stripped it off in 'explode' 3 lines above $date = strip_tags("<td $date"); $speaker = strip_tags($speaker); $description = strip_tags($description); //Double-check the data as a valid date, //maybe even speaker/description as non-empty //and either log error or insert to your database } ?> MOST sites with content you want to scrape on a routine basis are pretty predictable. CSS classes can be particularly useful to find the right bits you want to scrap. Occasionally I run across one where it's hand-edited and completely unpredictable -- and usually not worth scraping, in my experience. -- Like Music? http://l-i-e.com/artists.htm -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php