The new MM2 crawler disables mirrors which have failed to be successfully crawled for 4 consecutive crawls. This seems to be a good idea to reduce the total number of crawls by removing mirrors which are just too slow. Unfortunately the current default timeout of 2 hours is not enough. Especially for mirrors which mirror more than one category as the timeout is per host and not per category. The problem is also not network bound but it seems to be related to two crawlers updating the directories of all mirrors on the same database at the same time. To workaround this timeout problem I am now starting the crawler on the second crawler 3 hours later and I have also increased the timeout from 2 hours to 3 hours. Additionally a small fix is included to also crawl the last mirror in the database which was ignored until now. After this is applied I would also re-enable the auto-disabled hosts in the database. Can I get two +1 for these changes? Additionally I think we can remove the second crawler and just use the times when the first crawler is idle to crawl other hosts. So instead of starting a crawl every 12 hours on two crawlers we could crawl half of the mirrors every 6 hours. But that is for after the freeze. Adrian commit 0dc0b70d9790b95cb6c1f41b4d36fb9aa2c9fbfc Author: Adrian Reber <adrian@xxxxxxxx> Date: Wed May 13 11:23:21 2015 +0000 Start the crawl later on the second crawler. Even with rsync as crawl method some hosts are taking a very long time to be crawled. The network connection with rsync is only open for a short time, but with both crawlers reading and writing from the database it takes a very long time until the status of all directories is updated. Therefore this patch introduces a 3 hour delay of the crawl on the second crawler. This could also be solved with two different cron.d files; one for each crawler. diff --git a/roles/mirrormanager/crawler/files/crawler.cron b/roles/mirrormanager/crawler/files/crawler.cron index 3d695ca..c74b915 100644 --- a/roles/mirrormanager/crawler/files/crawler.cron +++ b/roles/mirrormanager/crawler/files/crawler.cron @@ -1,4 +1,8 @@ # run the crawler twice a day # logs sent to /var/log/mirrormanager/crawler.log and crawl/* by default # 32GB of RAM is not enough for 75 threads, 38 seems to work so far -0 */12 * * * mirrormanager /usr/bin/mm2_crawler --threads 38 `/usr/local/bin/run_crawler.sh 2` > /dev/null 2>&1 +# +# [ "`hostname -s`" == "mm-crawler02" ] && sleep 3h is used to start the crawl +# later on the second crawler to reduce the number of parallel accesses to +# the database +0 */12 * * * mirrormanager [ "`hostname -s`" == "mm-crawler02" ] && sleep 3h; /usr/bin/mm2_crawler --threads 38 `/usr/local/bin/run_crawler.sh 2` > /dev/null 2>&1 commit 06309516b88ffade6f00c78833bc62aa002d7f56 Author: Adrian Reber <adrian@xxxxxxxx> Date: Wed May 13 11:53:16 2015 +0000 Increase crawler timeout from 2h to 3h. Since MM2 is in production about 140 mirrors have been auto-disabled due to crawler timing out after 2 hours (default). Try if it works better with 3 hours. This in combination with the previous commit to decrease the load on the database should help to auto disable less good mirrors. Especially mirrors who mirroring almost everything can hardly be crawled within the 2 hour limit. Unfortunately the limit is per host and not category. diff --git a/roles/mirrormanager/crawler/files/crawler.cron b/roles/mirrormanager/crawler/files/crawler.cron index c74b915..66801d7 100644 --- a/roles/mirrormanager/crawler/files/crawler.cron +++ b/roles/mirrormanager/crawler/files/crawler.cron @@ -5,4 +5,4 @@ # [ "`hostname -s`" == "mm-crawler02" ] && sleep 3h is used to start the crawl # later on the second crawler to reduce the number of parallel accesses on # the database -0 */12 * * * mirrormanager [ "`hostname -s`" == "mm-crawler02" ] && sleep 3h; /usr/bin/mm2_crawler --threads 38 `/usr/local/bin/run_crawler.sh 2` > /dev/null 2>&1 +0 */12 * * * mirrormanager [ "`hostname -s`" == "mm-crawler02" ] && sleep 3h; /usr/bin/mm2_crawler --timeout-minutes 180 --threads 38 `/usr/local/bin/run_crawler.sh 2` > /dev/null 2>&1 commit 78c19a35d1706eb43e4e0c5fa202e0a6549674de Author: Adrian Reber <adrian@xxxxxxxx> Date: Wed May 13 11:59:37 2015 +0000 Also crawl the last mirror in the database. The last mirror in the database was not crawled and this adds '1' to the --stopid if necessary. diff --git a/roles/mirrormanager/crawler/files/run_crawler.sh b/roles/mirrormanager/crawler/files/run_crawler.sh index b9d642e..3269dea 100644 --- a/roles/mirrormanager/crawler/files/run_crawler.sh +++ b/roles/mirrormanager/crawler/files/run_crawler.sh @@ -26,4 +26,7 @@ for i in `seq 1 ${NUMBER_OF_CRAWLERS}`; do fi let STARTID=${STARTID}+${PART} let STOPID=${STOPID}+${PART} + if [ "${STOPID}" -eq "${MAX_HOST}" ]; then + let STOPID=${STOPID}+1 + fi done _______________________________________________ infrastructure mailing list infrastructure@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/infrastructure