On Tue, Apr 28, 2009 at 12:51:18PM -0600, Stephen John Smoogen wrote: > On Tue, Apr 28, 2009 at 12:42 PM, Matt Domsch <Matt_Domsch@xxxxxxxx> wrote: > > A few things I'd like to change starting tomorrow (post-freeze). > > > > * change the MM update-master-directory-list cronjob to start at 0 and > > ?30 past the hour, from its current schedule of trying to start every > > ?15 minutes. ?It is taking about 20 minutes on average to run, so > > ?really is only running twice an hour anyhow. > > > > * bump back the MM update-mirrorlist cronjob to start at :40 past the > > ?hour. ?It takes about 20 minutes to complete, and I would like the > > ?new content to land at the top of the hour. > > Is the 20 minutes a maximum or average? I was just wondering if > somewhere between 35 and 40 would make sure it doesn't conflict with a > job at the top of the hour? pretty much maximum, though to be fair, I am not recording the start and stop times for these events in their respective logfiles to know for sure. > > * increase the number of crawlers, from 45 to 75. ?A full run is > > ?taking about 3 hours now, I'd like to bring this down to under 2. > > ?This only affects bapp1, whose load average is still under 1 and has > > ?plenty of free RAM and CPU it seems. > > sorry for clueless question number 2. What is the limiting factors for > the crawlers? Network bandwidth/latency or CPU? More latency than bandwidth. The crawlers issue HTTP HEAD requests for a lot of files on each mirror to be sure they match. The latency in response to these requests (single-threaded to each mirror, but hitting 45 (or soon more) mirrors in parallel) is what limits the speed of an individual crawler. Then the time for the whole run is simply the time it takes to complete each of the crawlers. Very little CPU is used, except at the end of each crawler, when it updates the database with its findings. Then it jumps up in CPU for a few seconds. -- Matt Domsch Linux Technology Strategist, Dell Office of the CTO linux.dell.com & www.dell.com/linux _______________________________________________ Fedora-infrastructure-list mailing list Fedora-infrastructure-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list