On 09/06/07, Angelo Bourik <jusanotherangel@xxxxxxxxx> wrote:
Hello all,
Hi, If I understood your question correctly, you've written a spider script that runs via apache. That script connects to a remote site, fetches a page by id and parses out the data you need, once per request to your own server (or does it run all 100,000 requests per single request to your server?). My first question is, why are you running your script under apache, rather than as a command line process? I don't see any advantage to having apache parent these scripts for you. As regards having more than one script running at the same time, that's perhaps a matter of cpu usage, though it does sound odd that your server is not at least starting to run the others. Perhaps your script opens and locks your database in such a way that the others are blocked? I'd recommend running your spider on the command line, outside of the server. If you really want to speed it up, split the spider in to a fetching thread group that is network bound, and a parsing thread group that munges the queued responses. Or, just write your script in a faster language :-) -- noodl --------------------------------------------------------------------- The official User-To-User support forum of the Apache HTTP Server Project. See <URL:http://httpd.apache.org/userslist.html> for more info. To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx " from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx