Re: cURLing sites for images

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 15, 2006 5:36 pm, Wolf wrote:
> OK, I give up (for now at least).  I've been trying to figure out a
> way
> to run a single script and download my favorite comics.  The problem
> is
> that:
>
> 1.  The URLs include numerous websites (and the image is hosted on
> another server from the original)
> 2.  The filename can change but has the same basic name
>
> What I would love to do is use cURL to load the list of URLs, and then
> save all the images within each page into a directory based off the
> date.  What I have not been able to figure out how to do is use CURL
> to
> open the page and save all the *.gif and *.jpg files only.

A preg to find the .gif and .jpg files, and then you need to use
CURLOPT_BINARYTRANSFER to get the .gif/.jpeg

If you switch back and forth to text/binary, you'll want to get PHP
from CVS, cuz there was a bug:
http://bugs.php.net/bug.php?id=37061

Thank [deity] the PHP Devs could take my meandering ill-formed bug
report and turn it into something useful... :-^

> Anyone have a good pointer on this?  I've been going through the
> archives and some tutorials, but most seem for posting to another
> site,
> not snagging files.

It should look something like this:

<?php
$cookie_file = '/path/to/php/writable/file/somwhere/cookies.txt';

//basic curl handle:
$curl = curl_init();
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_COOKIEFILE, $cookie_file);
curl_setopt($curl, CURLOPT_COOKIEJAR, $cookie_file);

//snag HTML
curl_setopt($curl, CURLOPT_URL, 'http://example.com');
$html = curl_exec($curl);

//Find the images:
//This pattern is almost for sure wrong cuz I suck at PCRE:
preg_match_all('/<img[^>]+src="([^"]+)"/iS', $html, $images);
$images = $images[1];

//snag each image in turn:
curl_setopt($curl, CURLOPT_BINARYTRANSFER, 1);
foreach($images as $image_url){
  curl_setopt($curl, CURLOPT_URL, $image_url);
  $image = curl_exec($curl);
  file_put_contents("/some/path/to/where/you/want/images/$image_url",
$image);
}
?>

This is untested code off the top of my head, but it should be pretty
close.

You could maybe use Tidy or something DOM-like to find the images in
the URL if you were more of a purist, but I'm assuming that isn't the
issue, based on your post.

-- 
Like Music?
http://l-i-e.com/artists.htm

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux