Fwd: php page scrapping challenge!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Paragasu,

 Sounds like fun, though not really that difficult. It is a very
 horrible site, but it shouldnt' take that much to create the script
 for. They do not, in-fact, use Javascript to pull the movie times from
 the database. They reload the page with the added querystring
 variables (for my run through):

 isSearchBy=cin                     // How are we searching
 visCinID=1000                      // What is the cinema ID
 visMovieName=Iron+Man    // What movie do we want to see?

 I'd give it a try, but I am not setup to use curl at the moment, and
 don't anticipate having done so in time to do this.

 What you need to do is access the url that assigns you your session
 ID, and store that for subsequent curl calls to the server. You should
 pass it with all of them. You also need to find out where they
 generate the (what I assume is) dynamic part of their url so you can
 use that to access the actually movie url. The part in my url was:
 .../(3yujtbmepau3jb45a22gju55)/...

 Good luck with this project, and let us know how it goes.
 - Craige


On Sun, May 4, 2008 at 11:26 PM, paragasu <paragasu@xxxxxxxxx> wrote:
 > well, this going to be fun.
 >
 >  the website i am trying to scrapped is http://www.cathayholdings.com.my/
 >  it is a movie cinema website with very irritating design. They really tried
 >  to imposed the
 >  security to the point it is really not user friendly. The whole website
 >  written in asp.
 >
 >  I really hate to go around looking for the show time for the latest movie
 >  and decided to
 >  build my own simple website to display the movie and show time from the
 >  cathay cinema
 >  my own way.
 >
 >  But, it is proven not so easy to do. The datetime buried deep inside the
 >  online booking. Thus
 >  user will be able to see the showtimes only when the user click the online
 >  booking. Then, after
 >  user click the online booking, the link open on a new window and generate a
 >  cookies. this cookies
 >  will be part of the URL. So basically, there is two cookie value pass to the
 >  server. (one GET request & one in HTTP header)
 >
 >  Apart from that, they use javascript (AJAX?) to pull the showtime from the
 >  server after you have to
 >  click 3 times. OMG.. i only wan't to know the time and have to go thus whole
 >  step.
 >
 >  using  php curl library to simulate the request just to get the movie name
 >  and show time list from the
 >  server. it is possible? post your code..
 >
 >  ** no reward, just for php programming fun..
 >

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux