On Thu, June 23, 2005 3:24 am, José Miguel López-Coronado said: > I have seen how to use cURL to retreive results from a web site after > processing a form. The problem is that I want to simulate completely the > submiting of a form, I mean, I wan to "enter" the page in the server > instead of retreive the results into my own page. I'm trying to use a > php script to login into a site without having to enter user and pass. > Anyone knows how to do this using cURL or any other option like HEADER > or something like that? You will need to do curl_execute() or whatever it is several times. First, you'll user cURL to *read* the page with the FORM in it. This may (or may not trigger) some Cookies and/or some embedded tokens in the FORM that you may need. If it has no cookies and nothing fancy embedded in the FORM, then you may be able to comment out this first chunk of code, after you figure out that you don't need it, and you can proceed to Step 2. Do *NOT* delete this code. You may need it tomorrow if they change their login routine. Been there. Done that. Keep the code. Second, you'll then need to send a POST with curl of the username/password, and, again, get all the results. This step will almost certainly send a Cookie, or embed some kind of token in the URLs and FORMs that you need to catch and pass on to all subsequent HTTP/HTTPS requests. Third, you can request whatever it is you wanted in the first place, but you need to pass in the Cookies and tokens you got from the second step. Many of these pages might, or might not, return 302 headers for "Object Moved" If it's an MS/ASP/.net site, it will have a bunch of them, mostly bogus, because you can bypass them, usually. Again, you want to keep the code around in case some day they change their login procedure and those stupid re-directs actually have meaning. Right now, they just waste resources and sell more hardware for Microsoft, but is that really a shock? Reverse-engineering the login of a page can be challenging. Sometimes stuff you think is totally un-necessary turns out to be needed. Here are some hard-won Tips: 1. The button clicked on for login (or other form submission) may or may not have a NAME="..." parameter. If it has a NAME="..." parameter, you *may* need to pass in its VALUE="..." as one of your arguments in the POST/GET with curl to make it work. While it's less likely you need it, I've seen at least one site where they RELY on the default NAME="..." being "Submit" and its value of "Submit" and, yes, you had to pass those in to get past the gate. [shudder] 2. You need to catch/send Cookies. I never did get that automated Cookie Jar feature of curl to work for me. But it helps to see the Cookie variable names and values to figure out what the other programmer did (or didn't do) for their login process anyway, so you might as well manage them by hand. 3. In some cases, I needed to use a whole new curl handle to send the next request. Later research indicated that maybe curl was doing a POST by default after it had done POST the last time, and I should have over-ridden that... It was easier to just keep the code that get a fresh curl handle. In a high-performance situation, you might maybe care to fix that better. I didn't. 4. Keep debug output that dumps out the HTML you get at each step. Comment it out, but keep it. Document in the code itself what you found were the relevant elements that you needed to achieve the next HTTP interaction. Also comment anything funky that you thought would be relevant/needed data, but turned out to be useless, or even detrimental to send back on the next request. Yesterday's junk could be tomorrow's gold. 5. Dump out all headers and all HTML in your first debug code. When you think you know what's relevant, add more debug code below that to dump out the relevant stuff, and comment out the verbose debug code that dumps everything out. When you're *SURE* you have it right, and can login not just today, but also tomorrow, and also from another computer or three, then comment out the concise debug code. Keep all of it. You'll need it again when they change their login. 6. Set up your own PHP script on your own server that spits out all the headers sent by an HTTP request, and then be ready to copy/paste their output in as your output. Sometimes the silliest things are used by them to try to stop your "robot" from logging in. User Agent springs to mind. Anything your browser sends as a header is "fair game" for them to be checking, even if they are violating HTTP standards. Hey, this is the real world. You *WILL* find a site that violates standards really fast if you dig into this very much. 7. It's kind of "fun" in a hacker-sort of way to work through the login process of another site and see how it all fits together. It's certainly instructive! It can also be challenging. If you're stuck, take a break. Staring at the code and their HTTP output is unlikely to be fruitful after any length of time. -- Like Music? http://l-i-e.com/artists.htm -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php