Re: [PATCH 3/3] gc: Clean garbage .bitmap files from pack dir

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 18, 2015 at 06:06:40PM -0600, Doug Kelly wrote:

> Similar to cleaning up excess .idx files, clean any garbage .bitmap
> files that are not otherwise associated with any .idx/.pack files.
> 
> Signed-off-by: Doug Kelly <dougk.ff7@xxxxxxxxx>
> ---
>  builtin/gc.c     | 12 ++++++++++--
>  t/t5304-prune.sh |  2 +-
>  2 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/builtin/gc.c b/builtin/gc.c
> index c583aad..7ddf071 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -58,8 +58,16 @@ static void clean_pack_garbage(void)
>  
>  static void report_pack_garbage(unsigned seen_bits, const char *path)
>  {
> -	if (seen_bits == PACKDIR_FILE_IDX)
> -		string_list_append(&pack_garbage, path);
> +	if (seen_bits & PACKDIR_FILE_IDX ||
> +	    seen_bits & PACKDIR_FILE_BITMAP) {
> +		const char *dot = strrchr(path, '.');
> +		if (dot) {
> +			int baselen = dot - path + 1;
> +			if (!strcmp(path+baselen, "idx") ||
> +				!strcmp(path+baselen, "bitmap"))
> +				string_list_append(&pack_garbage, path);
> +		}
> +	}
>  }

Hmm. Thinking on this further, do we actually need to check seen_bits
here at all?

The original was trying to ask "is this a .idx file" by checking
seen_bits.  That was actually broken by the first patch in this series
for some cases, as we might send more bits. E.g., if you have "foo.idx"
and "foo.pack", this function will get called twice (once per file), but
with seen_bits set to IDX|BITMAP for both cases. And we would not match
the "==" above, and would therefore fail to trigger.

That case is re-fixed by this patch, which is good. But I think
seen_bits is not really telling us anything at this point. We know it's
a garbage case, or else report_helper wouldn't have passed it along to
us. But we care only about the extension in the path, which is what
distinguishes each individual call to this function.

So we can just check that.  I also think the logic may be clearer if we
handle each extension exhaustively, like:

  /* We know these are useless without the matching .pack */
  if (ends_with(path, ".bitmap") || ends_with(path, ".idx")) {
          string_list_append(&pack_garbage, path);
	  return;
  }

  /*
   * A pack without other files cannot be used, but should be saved,
   * as this is a recoverable situation (we may even see it racily
   * as new packs come into existence).
   */
  if (ends_with(path, ".pack"))
	  return;

  /*
   * A .keep file is useless without the matching pack, but it
   * _could_ contain information generated by the user. Let's keep it.
   * In the future, we may expand this to look for obvious leftover
   * receive-pack locks and drop them.
   */
  if (ends_with(path, ".keep"))
          return;

  /*
   * A totally unrelated garbage file should be kept, to err
   * on the conservative side.
   */
  if (seen_bits & PACKDIR_FILE_GARBAGE)
	return;

  /*
   * We have a file type that the garbage-reporting functions
   * know about but we don't. This function needs updating.
   */
  die("BUG: report_pack_garbage confused");

> -test_expect_failure 'clean pack garbage with gc' '
> +test_expect_success 'clean pack garbage with gc' '
>  	test_when_finished "rm -f .git/objects/pack/fake*" &&
>  	test_when_finished "rm -f .git/objects/pack/foo*" &&
>  	: >.git/objects/pack/foo.keep &&

And I think here we should make sure that we are covering the above
situations (and especially that we are keeping files that should be
kept).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]