Re: [PATCH v3 3/4] count-objects: report garbage files in pack directory too

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> writes:

> +/* A hook for count-objects to report invalid files in pack directory */
> +extern void (*report_garbage)(const char *desc, const char *path, int len, const char *name);

We may want to document the strange way the last three parameters
are used somewhere.  e.g.

	shows "path" (if "name" is NULL), or prepends "path" in
	front of name (otherwise); only for the latter, "path" can
	be a string that is not NUL-terminated but its length
	specified with "len" and in that case a slash is inserted
	between the path and the "name".

When described clearly, it sounds somewhat ugly and incoherent API,
even though it covers the immediate need X-<.

> +	sort_string_list(list);
> +
> +	for (p = packed_git; p; p = p->next) {
> +		struct string_list_item *item;
> +		if (!p->pack_local)
> +			continue;
> +		strbuf_reset(&sb);
> +		strbuf_add(&sb, p->pack_name,
> +			   strlen(p->pack_name) - 5); /* ".pack" */
> +		item = string_list_lookup(list, sb.buf);
> +		if (!item)
> +			continue;
> +		/*
> +		 * string_list_lookup does not guarantee to return the
> +		 * first matched string if it's duplicated.
> +		 */

Do we need to even allow duplication?  Why does prepare_packed_git_one()
below have to split the pathname into the base and extension in the first
place?

If you collect all pathnames that end with known extensions (".idx",
".pack" and ".keep") in the sorted "garbage" list as separate entries in
prepare_packed_git_one(), report_pack_garbage() can do the right thing
without trusting or iterating over packed_git list at all, I think.  As
the entries are sorted, .pack, .idx and all other valid .ext are grouped
together for the same basename.  If you see a group that have both .pack
and .idx, the group is good.  Otherwise, everybody in the group is bad
(e.g. a lonely .pack file without .idx is an unusable garbage).

How about doing it something along this line, perhaps?

	int i;
	int beginning_of_this_name = -1;
        int seen_bits = 0; /* 01 for .idx, 02 for .pack */
	for (i = 0; i < list->nr; i++) {
        	if (beginning_of_this_name < 0)
                	beginning_of_this_name = i;
		else if (list->items[i] and list->items[beginning_of_this_name]
                	 share the same basename)
			; /* keep scanning */
		else {
                	/* one name ended at (i-1) */
			if (seen_bits == 3)
				; /* both .idx and .pack exist; good */
			else
			        report_garbage_for_one_name(list, beginning_of_this_name, i,
                                		seen_bits);
			seen_bits = 0;
                        beginning_of_this_name = i;
		}
                if (list->items[i] is ".idx")
			seen_bits |= 1;
                if (list->items[i] is ".pack")
			seen_bits |= 2;

	}
	if (0 <= beginning_of_this_name && seen_bits != 3)
	        report_garbages_for_one_name(list, beginning_of_this_name, list->nr, seen_bits);

with a helper function report_garbage_for_one_name() that would look like this:

	report_garbage_for_one_name(...) {
                int j;
                const char *msg;
                switch (seen_bits) {
                case 0: msg = "no corresponding .idx nor .pack"; break;
                case 1: msg = "no corresponding .pack"; break;
                case 2: msg = "no corresponding .idx; break;
                }
                for (j = beginning_of_this_name; j < i; j++)
                        report_garbage(msg, list->items[j]);
	}

For the above to work, prepare_packed_git_one() needs to retain only the
paths with known extensions in garbage list. "pack-deadbeef.unk" can and
should be reported as a garbage immediately when it is seen without being
placed in the list.

> @@ -1045,9 +1100,33 @@ static void prepare_packed_git_one(char *objdir, int local)
>  			if (!p)
>  				continue;
>  			install_packed_git(p);
> -		}
> +		} else if (!report_garbage) {
> +			/*
> +			 * the rest of this if-chain requires
> +			 * report_garbage != NULL. Stop the chain if
> +			 * report_garbage is NULL.
> +			 */
> +			;
> +		} else if (has_extension(de->d_name, ".pack")) {
> +			struct string_list_item *item;
> +			int n = strlen(path) - 5;
> +			item = string_list_append_nodup(&garbage,
> +							 xstrndup(path, n));
> +			item->util = ".pack";
> +			continue;
> +		} else if (has_extension(de->d_name, ".idx")) {
> +			struct string_list_item *item;
> +			int n = strlen(path) - 4;
> +			item = string_list_append_nodup(&garbage,
> +							xstrndup(path, n));
> +			item->util = ".idx";
> +			continue;
> +		} else
> +			report_garbage("garbage found", path, 0, NULL);

Hmm, where is a ".keep" file handled in this flow?

The structure of the if/else cascade is much nicer than the earlier
iterations, but wouldn't it be even more clear to do this?

	if (is .idx file) {
		... do that .idx thing ...
	}

	if (!report_garbage)
        	continue; /* it does not matter what the file is */

	if (is .pack) {
		... remember that we saw this .pack ...
	} else if (is .idx) {
		... remember that we saw this .idx ...
	} else if (is .keep) {
		... remember that we saw this .keep ...
        } else {
        	... all else --- report as garbage immediately ...
	}
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]