Re: a loop constructing the URLs and make PHP to fetch up to 10 thousand pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 23 October 2010 06:10, "jobst mÃller" <Floobee@xxxxxx> wrote:
> hello dear list - good morning!
>
>
> I am trying to figure out a method how to retrieve on the same URLs [see Âbelow] with different query arguments, and i am wondering if this is doable with PHP!?
>
> on a sidenote: ÂWell - i guess that we can do this with ÂLWP::UserAgent - Âguess that this provides a way for us to loop through the query arguments: I am not sure that LWP::UserAgent has a method for us to do that. I tried to figure it out. And i digged deeper in the Manpages and Howtos. we can have a loop constructing the URLs and use
> LWP::UserAgent repeatedly:
>
>
> see the Code:
>
> for my $id (0 .. 100000)
>
> {
>
> Â$ua->get($url."?id=21&extern_eid=".(0-$id))
>
> Â//rest of the code
>
> }
>
>
>
> Well, alternatively we can add a request_prepare handler that Âcomputes and add the query arguments before we send out the request. Do you think that this fits the needs?
>
> But wait - i want to do this with PHP!
>
> What is aimed: Here on this following site we find a list of many Âschools: [see the page with the subsequent results - approx more than 1000 sites]
>
> see this site: Âhttp://www-db.sn.schule.de/index.php?id=25
>
> i want to fetch the sites that are listet on this page - and therefore i Âwant to use PHP for this [job] - and subesquently
> Âi want to parse them.
>
> the sites can be reached directly - by constructing in other words the subsites of the overview can be reached via direct
> Âlinks... see the following.
>
>
> http://www-db.sn.schule.de/index.php?]id=21&extern_eid=1543
>
> http://www-db.sn.schule.de/index.php?]id=21&extern_eid=709
>
> http://www-db.sn.schule.de/index.php?]id=21&extern_eid=789
>
> http://www-db.sn.schule.de/index.php?]id=21&extern_eid=1297
>
>
>
> Well - i want to fetch all those.... And i try to do it with PHP and a
> mentioned loop. Does this work!?
> ___________________________________________________________
> GRATIS! Movie-FLAT mit Ãber 300 Videos.
> Jetzt freischalten unter http://movieflat.web.de
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

You can go down that route. If the site has an API allowing you to
gather the data another way (i.e. in 1 large file), then use that
instead.

I'd also make sure you are allowed to gather all that data. You may
find some issues with that - just check to be sure.

Also, you had a typo in the URLs ... an extra ] for no reason.



-- 
Richard Quadling
Twitter : EE : Zend
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux