On Sat, May 03, 2014 at 11:40:49PM +0200, Pierre-Yves Chibon wrote: > Hi Matt, > > I would like to run by you my understanding of how mirrormanager works, I hope > I am not to far from the truth but please if I am let me know :) > > Mirrormanager is splitted into three parts: the UI, the API and a cron task. There are several cronjobs, but basically correct. > The UI is the current TurboGears1 application [1]. People can login in there and > register an institution (Site) with one or more sub-domain (Host) each mirroring > one or more Product (Fedora, EPEL...). > For each sub-domain (Host), there are a number of settings available such as > from > where the mirror pulls its updates, who can pull from it, restricting the user's > country or network. > Users manage their mirror there, admin can access any Site/Host and update their > settings as well. Correct. This is the ugliest part of MM1.x, I will be most glad for its complete rewrite. > The API is a cgi script, called in our case by yum, which redirects the user to > the closest mirror, active, up-to-date and according to the settings of the > mirror. Correct. This (mirrorlist-server directory) is almost entirely divorsed from the TG 1.x components. It does not touch the database directly in any way, but is handed a pickle containing a precalculated cache of the database, which can then be distributed to each of the mirrorlist-server servers. As long as you don't break the format of the pickle, you don't have to change this code. > The cron task, runs daily and crawls the mirror to check if they are > up to date or not and register in the database which folder are up > to date and which are not for every single mirror. This cron task > (or is it another?) also generates the publiclist pages [2] using > the data it just retrieved about each mirror. There are several different cronjobs. 1) crawlers which checks every directory on every mirror every few (~2-4) hours, and updates the database with which directories on each mirror are up-to-date. We don't mark whole servers as stale, but individual directories on individual servers. 2) generate the publiclist pages. 3) generate the mirrorlist-server pickle. 4) retrieve some external network routing data from public routers (both Internet2 and Internet1) used to identify which servers and clients are on which autonomous systems, and to select the proper server. 5) get the monthly GeoIP database from maxmind. This isn't part of MM proper but is necessary to run. > I have been trying to figure out how exactly are retrieved the mirrors on > publiclist. I found the query used by MM1 [3] and I was wondering if there would > not be a way to simplify that. Would it be an option to have the cron task > setting a flag on the database saying if a host is up to date or not? > Or am I missing some of the information? It is complicated because there are two main cases: 1) we track mirror freshness by individual directory, not by whole mirror. We will put a mirror on the publiclist if at least one of its directories has up-to-date content. The liklihood of a mirror having at least one directory stale throughout any given day is quite high as content changes on the masters frequently; I don't want to keep adding and removing mirrors from the publiclist at that frequency. To change to a host-global up-to-date flag would certainly simplify this logic, at a loss of granularity. 2) a few mirrors are special, they have the 'always_up2date' flag set on their HostCategory. These are particularly Fedora master mirrors that we don't want to crawl, but do want them to appear in the publiclist. A second case is "trusted" mirrors (e.g. the ones I run at Dell) which because of firewalls cannot be hit by the crawlers, but can serve content internally just fine. Those are also marked "always_up2date" and it's incumbant on me to make that assertion true. > If we could simplify this part we could drop the cron task generating the > publiclist pages and just display them on the fly as part of the UI. I'd prefer not, but I'll leave that to the collective new maintainance team to determine the desired output. :-) > As part of the re-write, there is of course the UI since it is TurboGears1, but > the CGI script and the cron tasks should not need much changes, would they? Some of the cron tasks call into the TG1 model/controller code directly, so yes they would. The CGI would not. There are a few other "helper apps" like "move-development-to-release" which also call into the TG1 model/controller code and would need to be updated. Thanks, Matt _______________________________________________ infrastructure mailing list infrastructure@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/infrastructure