Apache Multi Process Limit
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Hello all,
I am in a bind with Apache's multi process limit. Let me explain what I am
doing. There's this website which has career details of all the football
players since the beginning of professional football. They have a simple web
form which allows you to look at a player's profile by entering his name or his
7 digit numeric id number (on that website).
One of my client wants a list of all the players with a certain
"flag" in their profile. So I created an automatic form submission
and HTML parsing script to get details of all the players with that
"flag" in their profile. Let me not go into too much details and tell
you that after applying a few pattern rules to the id number, the number of
possible id numbers comes to about 1 million (instead of 10^7; each field can
have {0,1,2,3,4,5,6,7,8,9}=10 digits, so net combinations =
10*10*10*10*10*10*10).
Therefore, to completely automate this process I wrote a script which would
generate an id number, submit the form with that id number, and parse the
resulting HTML profile for the "flag". If the script finds a hit on
the flag, it stores all the fields of that player in a database. This script is
working absolutely fine but the speed I was getting was about one check per
second which means that I would have to leave the script running for about 11
days (to process all of about 1 million checks).
So i came up with this idea to divide the check into ten parts and i created
separate scripts for each part. Now basically the first script checks for the
first 100 thousand combinations, the second checks for another 100 thousand
combinations, and so on.
The problem is that I am able to get only two of these scripts running at
the same time. So it would still take me at least 5 days to get all the
results. The rest of the scripts just sit there in the server's backlog. This
is definitely due to Apache's limitation to handle multiple processes. The
server I am using to run this script as well the target webserver both run on
Apache2. I am sure it's not a problem with the receiving server. It has to be
my Apache web server which is running the scripts. I have tried using mpm_winnt (on a
windows server) as well as the prefork and worker modules (on
a linux server) without any luck. Has any of you ever faced the same situation?
Please guys help me out here.
Best,
Tony Miller
PS: For those concerned about the legitimacy of this work, rest assured, this
is absolutely legit. There's nothing in the website's use policy which
restricts somebody from doing this. Moreover, my client hired me to do this
only because the website owners were not able to hand over the data he
required. They gave the stupid reason that they are helpless in providing the
data because they don't have a system in place which would allow them to do a
search restriction!
[Index of Archives]
[Open SSH Users]
[Linux ACPI]
[Linux Kernel]
[Linux Laptop]
[Kernel Newbies]
[Security]
[Netfilter]
[Bugtraq]
[Squid]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Samba]
[Video 4 Linux]
[Device Mapper]