Background: I'm using cURL to snarf down a web page and examine an image in that page -- where I would like to be able to use http://php.net/imagecolorat on the image. There are some wrinkles, however, best explained by a slimmed-down sample program: <?php function foo(){ global $curl; if (!isset($curl)) $curl = curl_init(); //Fetch HTML curl_setopt($curl, CURLOPT_RETURNTRANSFER); curl_setopt($curl, CURLOPT_COOKIEJAR, 'cookies'); curl_setopt($curl, CURLOPT_COOKIEFILE, 'cookies'); curl_setopt($curl, CURLOPT_URL, 'http://example.com'); $html = curl_exec($curl); //Fetch image: preg_match('/<img src="([^"]*">/', $html, $image_url); $image_url = $image_url[1]; curl_setopt($curl, CURLOPT_BINARYTRANSFER, 1); //Getting binary data curl_setopt($curl, $image_url); $image_string = curl_exec($curl); curl_setopt9$curl, CURLOPT_BINARYTRANSFER, 0); //Set it BACK to text! //SOMETIMES that URL sends me this for an image: $bad_data = '<META HTTP-EQUIV=Refresh URL="0;http://example.com">'; if (stristr($image_string, $bad_data)){ //start all over again: return foo(); } //Use GD to get image: $image = imagecreatefromstring($image_string); //Begin analysis //irrelevant to the problem, deleted. $result = 'foo'; return $result; } //Assume the image changes on every page hit, and we call foo() a LOT for ($i = 0; $i < 100000; $i++){ echo foo(); sleep(mt_rand(1, 5); //Don't kill their server } ?> NOW, for the problem[s]. #1. If I don't use BINARYTRANSFER, then imagecreatefromstring segfaults, pretty much every time. Well usually, anyway. Presumably, that's because cURL/PHP are pretending the string is null-terminated when it's not, and then handing a corrupted image string to GD, and that's bad. Or, perhaps, without BINARYTRANSFER, some sort of CRLF correction is corrupting the binary data. I dunno, really. I just figured I got binary data coming in, and I must want BINARYTRANSFER, based on what I can find documented. So, assuming BINARYTRANSFER means what I think it means, I need that. I've put in a bug report here, and pajoye is being VERY helpful, in hopefully getting segfault to be an E_ERROR instead of segfault: http://bugs.php.net/bug.php?id=37005 So this one will probably get resolved, eventually. But I'm hoping for a pointer to a longer explanation of what BINARYTRANSFER actually does, as I've only found rather circular/brief definitions so far on php.net and I'm not finding anything on the libcurl page here: http://curl.haxx.se/libcurl/c/curl_easy_setopt.html A quick Google also yielded only the barest circular definition: CURLOPT_BINARYTRANSFER TRUE to return the raw output when CURLOPT_RETURNTRANSFER is used. I mean, yeah, guys, I know what the words BINARY and "raw output" mean, and the docs pretty much tell me they are synonyms... That's not real useful, eh? :-) :-) :-) #2. Once I start using BINARYTRANSFER, however... I *still* get segfaults sometimes, even when everything else seems to be okay. This is happening in *all* of these versions from CGI compile on command-line usage: PHP 5.0.4 PHP 5.1.2 PHP 5.1.2RC3 So, perhaps my use of BINARYTRANSFER is completely wrong, and merely masks the real problem a little bit? The segfault DOES happen at different points in the different versions of PHP. 5.0.4 segfaults within call to imagecreatefromstring() 5.1.2RC3 segfaults at some later point. #3. It seems like once is set BINARYTRANSFER to 1, setting it back to 0 is not taking effect... I say this because after a recursive call to foo() to start over, I get $html filled with data such as: <html><head>...</head><body>...</body></html>ZZZZZZZ...ZZZZ?more garbage data I.E., it seems like curl and/or PHP are ignoring null-terminated data, and using some other indicator to define the end of a string. As additional evidence, I get messages such as: Run-time warning. String is not zero-terminated ( ) (source: /php-5.1.2/Zend/zend_variables.h:45) in /script.php:128 /php-5.1.2/Zend/zend-hash.c(754) : ht=0x8381124 is being cleaned Now, I dunno what all that is supposed to mean, but I'm pretty sure it's a sign of things going drastically wrong with a string being treated as binary data when it's not or vice versa... Is it not possible to switch CURLOPT_BINARYTRANSFER back to 0 ? Or is 0 treated as TRUE in cURL and I need FALSE? Surely not, right, since PHP handles that internally... #4 The complaint about a string not being zero-terminated is happening on the line such as: if (stristr($image_string, $bad_data)){ stristr is supposed to be "binary-safe" My assumption, then, was that I could search inside of a binary data string (a valid image) for a particular pattern (the HTML they send out instead of a JPEG sometimes) to detect when they've done that... So, apparently, "binary-safe" doesn't mean what I think it means... Or I've found another bug in PHP? Unlikely. What does binary-safe actually MEAN anyway? #5 Is there some way to distinguish between a binary string and a "normal" string. I.E.: $image_string = file_get_contents('image.jpg'); if (is_binary_string($image_string)) There does not seem to be an "is_binary_string" function, just "is_string"... How can I check? Some things I have considered: If I abandond the COOKIEJAR/COOKIEFILE, I can manage the cookies myself, and hopefully, detect some headers that the server is HOPEFULLY sending out before this goofy META tag to refresh the whole page when I've asked for a JPEG. I may be wasting my time with this BINARYTRANSFER because it's not what I think it is. [The following will make sense only to a select few readers...] Given the PHP / GD double-free bug and the solutions of bundling and/or upgrading to the latest GD and PHP which have functions specifically so that PHP can free the RAM instead of GD doing it, what combination of PHP / GD versions and bundle/separate is *most* stable for CGI/CLI usage? Does CGI versus CLI have any real effect on GD?... Maybe I've missed a whole thread of research here. Never remember why CGI/CLI are different, though I've re-read the page a lot. Always seems a whole lot of nothing to my usage habits, as I recall, but maybe I'm missing an implication in my reading. I did my initial tests with this script using local image files instead of curl and the real data, as I didn't want to pound the server with development mistakes (infinite loops etc). The original data was acquired with basic PHP file operations from that server, though, so I thought I was using exemplary sample data. So only when I try to use cURL to get my image does everything fall apart. I can save the cURL fetched data using file_put_contents, and, sure enough, I can crash the script using those local files, when I don't use BINARYTRANSFER. Saved files from cURL using BINARYTRANSFER do not seem to crash, or at least not nearly as often. The main exception being that if I save their "META Refrash" as a JPEG file, and try to imagecreatefromstring(file_get_contents()) on that, I can segfault. I guess I'm at the point now where I have so MANY theories about what to try next to tackle this, that I just don't know what to do. Any collective wisdom from the list on which road to take would be most welcome. Thanks for reading. Sorry it got so long, but I'm really thrashing around here, trying to make heads or tails of any of this. -- Like Music? http://l-i-e.com/artists.htm -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php