On Monday 31 July 2006 17:36, John Gunther wrote: > I'm trying to programatically retrieve a sales tax lookup page using > file_get_contents() but the page doesn't return data unless a session id > is first retrieved and then supplied. > > You can see how it works as follows: > > The first time I send the following request: > http://www7.nystax.gov/STLR/stlrHome?street=48%20central%20ave&zip=12472&B1 >=Lookup%20Address I get an empty form back, and one of the response headers > is Set-Cookie with a value of, say, > JSESSIONID=0001WQEmZF6tI-yClq4S9_7a8ii:10amela49;Path=/ > > If I then reload the same URL, the resulting page includes the desired > info, probably because the second time one of the request headers is > Cookie with a value of JSESSIONID=0001WQEmZF6tI-yClq4S9_7a8ii:10amela49 > > I want to get the second version of the page, the one with data, using > file_get_contents(), but it appears I first have to get the page to > return a sessionid and then I have to send it in a Cookie header the > second time. > > Reading the first GET's response headers and sending the needed request > header on the second GET - in combination with file_get_contents() - is > just beyond me. Can anyone enlighten me? > > John Gunther I deal with screen-scraping a lot at work. I would suggest using cURL to store the cookie data, and then subsequently get the data you need. check out http://us3.php.net/manual/en/ref.curl.php and the curl man page for more info. or, you can use your own HTTP implementation ;) HTH -- Ray Hauge Programmer/Systems Administrator American Student Loan Services www.americanstudentloan.com 1.800.575.1099 -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php