For what it's worth, for me telnet php.net 80 (then GET ...) takes ~25
seconds, most of which seems to be the reverse lookup. If I just telnet
directly to php.net's IP directly and do the same, it's instant. Doing
file_get_contents takes less than 1s numerically or not.
jon
(Replied directly to the list, 'cause I know you're on it, and I don't
see the need for you to get two copies of the same thing. :-)
Richard Lynch wrote:
The canonical PHP example of web-scraping:
<?php echo file_get_contents('http://php.net/');?>
fails on a machine I'm using.
I'm laying out here all the things I've done and eliminated, and it
got awfully long...
Short Version:
FC4 + LAMPP on 2 different private IP boxes at day job
file_get_contents('http://php.net') hangs and times out after 2 minutes.
telnet php.net 80 | GET / HTTP/1.0 hangs and times out after 2 minutes.
wget php.net WORKS
links php.net WORKS
My windows/cygwin desktop on the same subnet WORKS
[Well, windows is broken, of course, but that's not relevant here :-)]
ping and traceroute both work fine everywhere
What configuration boo-boo from a stock FC 4 + LAMPP install could
manage to break file_get_contents and telent, but wget and links
work?!
Long Version:
I've checked allow_url_fopen with phpinfo() and php -i
allow_url_fopen => On => On
Further analysys reveals some odd info:
telnet php.net 80
GET / HTTP/1.0
Host: php.net
[yes, I hit enter here]
just sort of hangs until it times out in TWO MINUTES
So you'd think that it's obviously the DNS records screwed up somehow,
with an extra-long 2-minute timeout instead of the usual 30 seconds.
Buuuuuuut:
wget http://php.net
works flawlessly
links http://php.net
works flawlessly
I can ping php.net just fine -- which is maybe a no-brainer with wget
and links working, but I like to check.
traceroute also looks normal to me, though I'm no expert
[aside: How come guys set things up so complex they gotta bounce my
routing between four of their own machines in the same data center?
What's up with that? (shrug)]
So, apparently, wget and links are doing something "extra" that breaks
through whatever this roadblock is for file_get_contents and telnet
80.
I thought it might maybe be some kind of header redirect support that
is lacking, but then telnet 80 would behave differently, and
file_get_contents should work for that. Plus I tried it on my own
site that does not have any kind of redirect headers going out, and
got the same results. file_get_contents/telnet fail. wget/links
works.
Now I realize that wget and links are vastly superior weapons and send
all kinds of extra headers.
But I can do the above script on other boxes, and it works fine, so
it's probably not the web-servers denying access on the basis of
sparse headers.
Now this could be a TWO MINUTE warning since the Bears are 5-0 or
whatever, but I think I'll ignore that possiblity for now.
I'm also fairly sure it's not even a PHP problem, but don't know where
to turn, so I'm posting here in time-honored fashion :-)
If it was consistently failing no matter the software used to scrape
(php, wget, links) I'd know it was DNS or the network card or
whatever.
But what would make telnet and file_get_contents fail and timeout
after 2 minutes, while wget and links work flawlessly?
Where would I even start? I'm checking in-house with our IT guys, but
they're mostly Windows guys, so if this is something specific to LAMP,
I'm down the tubes there.
The box is a duplicate of another box, and I installed everything
rather quickly to make them both "match" as far as I could tell.
Fedore Core 4
LAMPP
I don't know much about LAMPP, except they put everything in /opt
which was annoying, but it all works, so I just left it alone.
The RELEASE_NOTES document it as:
[2006-01-08] XAMPP for Linux 1.5.1
Since telnet is not acting right, I doubt that LAMPP is the culprit...
Oh, and of course I checked the other box, of which this is a
duplicate, and it behaves the same way.
The only thing I can tell you about our network topology is:
Box #1: 192.168.4.5 (the bulk of the email is about)
Box #2: 192.168.5.123 (the original just referenced)
Desktop: 192.168.4.13 (Windows box, with Cygwin, works fine)
All other Internet things I've done from my desktop work just fine --
Including the file_get_contents() referenced above. So now I've
narrowed it down to FC4 and/or LAMPP configuration, but have no idea
what to do next.
I'm definitely not a hardware guy, and not a network admin guy either,
so hopefully all this has made the answer painfully obvious to
somebody who is and they can help this poor befuddled application
developer out. :-)
Apologies for this NNOT post, but even a pointer of where to start
would be good. I suppose LAMPP would be my next guess, since it's
inconceivable that FC4 would be this borked without a zillion alarms
going off, but how could LAMPP manage to break this?
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php