https://fedorahosted.org/fedora-infrastructure/ticket/3268 notes that a mirror might not be removed from the list even though it's stale. In particular, there is a code path called add_parents() whose job it is to mark all parent directories of a target directory up-to-date or not, if those parent directories had not already been determined to be up-to-date for themselves. This can happen if a directory has no files in it, for example, only child directories. This code path had an incorrect key lookup, specifically: - parent = '/'.join(splitpath[:-1]) - try: - hcd = host_category_dirs[(hc, parent)] which was looking up the parent directory in the host_category_dirs cache (which is later operated on). However, the actual key here is not a the string form of the parent directory name, it is a Directory object. So it's looking up the wrong thing, failing the lookup, and then proceeding to mark all its parent directories up-to-date incorrectly. In particular, it is marking all parent directories up-to-date (e.g. pub/epel/5/i386) when a child subdirectory (pub/epel/5/i386/repoview/layout) is marked up-to-date, even if the parent directory is not in fact up-to-date. The patch below fixes this by splitting out the parent directory lookup function into its own function for readability, and fixes the key lookup. I've tested this on bapp02 against a stale mirror that was previously marked up-to-date incorrectly, and it fixes it. I'd like to hotfix bapp02 to address this. Thanks, Matt -- Matt Domsch Technology Strategist Dell | Office of the CTO --- crawler_perhost 2010-09-06 14:46:21.000000000 +0000 +++ crawler_perhost 2012-05-12 01:20:54.604906708 +0000 @@ -348,21 +348,24 @@ break return pref - -def add_parents(host_category_dirs, hc, d): - splitpath = d.name.split('/') +def parent(directory): + parentDir = None + splitpath = directory.name.split(u'/') if len(splitpath[:-1]) > 0: - parent = '/'.join(splitpath[:-1]) + parentPath = u'/'.join(splitpath[:-1]) try: - hcd = host_category_dirs[(hc, parent)] - except KeyError: - try: - parentDir = Directory.byName(parent) - host_category_dirs[(hc, parentDir)] = True - except SQLObjectNotFound: # recursed out of the directory structure - parentDir = None - - if parentDir and parentDir != hc.category.topdir: # stop at top of the category + parentDir = Directory.byName(parentPath) + except SQLObjectNotFound: + pass + return parentDir + +def add_parents(host_category_dirs, hc, d): + parentDir = parent(d) + if parentDir is not None: + if (hc, parentDir) not in host_category_dirs: + print "directory %s adding parent %s, unknown up2date state" % (d.name, (hc, parentDir)) + host_category_dirs[(hc, parentDir)] = None + if parentDir != hc.category.topdir: # stop at top of the category return add_parents(host_category_dirs, hc, parentDir) return host_category_dirs _______________________________________________ infrastructure mailing list infrastructure@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/infrastructure