Re: Retrieve pages from an ASP driven site

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 2, 2012 at 11:37 PM, EPA WC <epawcweb@xxxxxxxxx> wrote:
> Hi List,
>
> I am trying to write a crawler to go through web pages at
> http://www.freebookspot.es/CompactDefault.aspx?Keyword=. But I am not
> quite familiar with how asp uses _doPostBack function with the "next"
> button below the book list to advance to the next page. I hope someone
> who knows ASP well can help out here. I need to know how to retrieve
> next page with PHP code.
>
> Kind regards,
> Tom


Looking at that page source, I think this might be a bit problematic.

Notice that practically the whole page is inside a form. When you get
down to the "Next> " button, that is going to sumbmit the form with
it's appropriate fields set. If you look at the beginning of the form,
you'll see some interesting fields, one in particular, __VIEWSTATE is
pretty clearly an encoded value of some sort.

When your crawler parses the page, it will have to stash the field
values that the form sets in order to process the form correctly to
get the next page of entries by simulating a POST-data submit. This is
(probably?) most easily handled via libcurl.

Unsolicited advice: Many sites do not appreciate scraping activity;
make sure your crawler obeys robots.txt rules, and do not overtax the
site with crawler activity.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux