Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> writes: > +/* A hook for count-objects to report invalid files in pack directory */ > +extern void (*report_garbage)(const char *desc, const char *path, int len, const char *name); We may want to document the strange way the last three parameters are used somewhere. e.g. shows "path" (if "name" is NULL), or prepends "path" in front of name (otherwise); only for the latter, "path" can be a string that is not NUL-terminated but its length specified with "len" and in that case a slash is inserted between the path and the "name". When described clearly, it sounds somewhat ugly and incoherent API, even though it covers the immediate need X-<. > + sort_string_list(list); > + > + for (p = packed_git; p; p = p->next) { > + struct string_list_item *item; > + if (!p->pack_local) > + continue; > + strbuf_reset(&sb); > + strbuf_add(&sb, p->pack_name, > + strlen(p->pack_name) - 5); /* ".pack" */ > + item = string_list_lookup(list, sb.buf); > + if (!item) > + continue; > + /* > + * string_list_lookup does not guarantee to return the > + * first matched string if it's duplicated. > + */ Do we need to even allow duplication? Why does prepare_packed_git_one() below have to split the pathname into the base and extension in the first place? If you collect all pathnames that end with known extensions (".idx", ".pack" and ".keep") in the sorted "garbage" list as separate entries in prepare_packed_git_one(), report_pack_garbage() can do the right thing without trusting or iterating over packed_git list at all, I think. As the entries are sorted, .pack, .idx and all other valid .ext are grouped together for the same basename. If you see a group that have both .pack and .idx, the group is good. Otherwise, everybody in the group is bad (e.g. a lonely .pack file without .idx is an unusable garbage). How about doing it something along this line, perhaps? int i; int beginning_of_this_name = -1; int seen_bits = 0; /* 01 for .idx, 02 for .pack */ for (i = 0; i < list->nr; i++) { if (beginning_of_this_name < 0) beginning_of_this_name = i; else if (list->items[i] and list->items[beginning_of_this_name] share the same basename) ; /* keep scanning */ else { /* one name ended at (i-1) */ if (seen_bits == 3) ; /* both .idx and .pack exist; good */ else report_garbage_for_one_name(list, beginning_of_this_name, i, seen_bits); seen_bits = 0; beginning_of_this_name = i; } if (list->items[i] is ".idx") seen_bits |= 1; if (list->items[i] is ".pack") seen_bits |= 2; } if (0 <= beginning_of_this_name && seen_bits != 3) report_garbages_for_one_name(list, beginning_of_this_name, list->nr, seen_bits); with a helper function report_garbage_for_one_name() that would look like this: report_garbage_for_one_name(...) { int j; const char *msg; switch (seen_bits) { case 0: msg = "no corresponding .idx nor .pack"; break; case 1: msg = "no corresponding .pack"; break; case 2: msg = "no corresponding .idx; break; } for (j = beginning_of_this_name; j < i; j++) report_garbage(msg, list->items[j]); } For the above to work, prepare_packed_git_one() needs to retain only the paths with known extensions in garbage list. "pack-deadbeef.unk" can and should be reported as a garbage immediately when it is seen without being placed in the list. > @@ -1045,9 +1100,33 @@ static void prepare_packed_git_one(char *objdir, int local) > if (!p) > continue; > install_packed_git(p); > - } > + } else if (!report_garbage) { > + /* > + * the rest of this if-chain requires > + * report_garbage != NULL. Stop the chain if > + * report_garbage is NULL. > + */ > + ; > + } else if (has_extension(de->d_name, ".pack")) { > + struct string_list_item *item; > + int n = strlen(path) - 5; > + item = string_list_append_nodup(&garbage, > + xstrndup(path, n)); > + item->util = ".pack"; > + continue; > + } else if (has_extension(de->d_name, ".idx")) { > + struct string_list_item *item; > + int n = strlen(path) - 4; > + item = string_list_append_nodup(&garbage, > + xstrndup(path, n)); > + item->util = ".idx"; > + continue; > + } else > + report_garbage("garbage found", path, 0, NULL); Hmm, where is a ".keep" file handled in this flow? The structure of the if/else cascade is much nicer than the earlier iterations, but wouldn't it be even more clear to do this? if (is .idx file) { ... do that .idx thing ... } if (!report_garbage) continue; /* it does not matter what the file is */ if (is .pack) { ... remember that we saw this .pack ... } else if (is .idx) { ... remember that we saw this .idx ... } else if (is .keep) { ... remember that we saw this .keep ... } else { ... all else --- report as garbage immediately ... } -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html