Re: [PATCH 3/4] t5304: Ensure wanted files are not deleted

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 13, 2016 at 2:55 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Doug Kelly <dougk.ff7@xxxxxxxxx> writes:
>
>> Subject: Re: [PATCH 3/4] t5304: Ensure wanted files are not deleted
>
> I'd suggest s/wanted/non-garbage/.
>

I'm okay with this.

>> Explicitly test for and ensure files that may be wanted are not
>> deleted during a gc operation.  These include .pack without .idx
>> (which may be in-flight), garbage in the directory, and .keep files
>> the user created.
>
> "garbage in the directory" is not well defined.  "files in the
> directory that clearly are not related to packing" is probably what
> you meant, but the definition of "related to packing" is still
> fuzzy.  Please clarify.

This is probably a good point.  Perhaps a better way to think about it
would be by rewording the paragraph to something like this:

Explicitly test for and ensure files that may either be desired by the user
or are possibly not garbage are not deleted during a gc operation.
These include .pack files missing a corresponding .idx file (possibly due
to it being in-flight), .keep files created by the user, and other
unknown garbage in the pack directory.  These files will still be identified
by "git count-objects -v", but should not be removed automatically by
gc.  Only files we are absolutely sure are unnecessary will be removed
as a part of the gc process.

>
> The following is me thinking aloud about things that you would need
> to think about while attempting to clarify this.
>
> What should the code do if we find
>
>     pack-b0a9d62a7471e58832a575a78d57f8fb26822125.frotz
>
> in $GIT_OBJECT_DIRECTORY/pack/ directory?  Is it a "garbage in the
> directory"?  The filename looks so similar to the usual things with
> know suffixes .pack, .idx, .bitmap, and .keep, that we may want to
> consider that it might be another file related to the packing left
> by a future version of Git and if we do not see corresponding .pack
> we would want to remove it?  Or do we want to do something else?
>
> What should "gc" do if we find
>
>     pack-frotz.idx
>
> without corresponding ".pack"?  Wouldn't it be safer to consider it
> a garbage unrelated to packing (because regular packing would have
> given it with 40-hex name, not "frotz") and leave it undeleted?
>

I think the above paragraph helps explain what we're doing and why.
In your examples, a somewhat valid looking pack file with an unknown
extension may be flagged as "garbage," but should not be deleted
during the gc.  Similarly, we decided that an .idx file with no
corresponding .pack was safe to delete (since the pack is written before
idx, and the initial performance problem was related to scanning a large
number of idx files).

I'm not saying there's nothing to be said for the difference in the base
filename without extension.  Currently, the logic to remove pack garbage
doesn't look at that, though: it only considers the extension, and what
related files are found in the directory.  Whether this is good or bad, I'm
not sure.  It certainly does what I need at fairly low risk, though.

Does this help clarify the situation more?

> Thanks.
>
>> Signed-off-by: Doug Kelly <dougk.ff7@xxxxxxxxx>
>> ---
>>  t/t5304-prune.sh | 17 +++++++++++++++++
>>  1 file changed, 17 insertions(+)
>>
>> diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
>> index 4fa6e7a..f7c380c 100755
>> --- a/t/t5304-prune.sh
>> +++ b/t/t5304-prune.sh
>> @@ -285,6 +285,23 @@ EOF
>>       test_cmp expected actual
>>  '
>>
>> +test_expect_success 'ensure unknown garbage kept with gc' '
>> +     test_when_finished "rm -f .git/objects/pack/fake*" &&
>> +     test_when_finished "rm -f .git/objects/pack/foo*" &&
>> +     : >.git/objects/pack/foo.keep &&
>> +     : >.git/objects/pack/fake.pack &&
>> +     : >.git/objects/pack/fake2.foo &&
>> +     git gc &&
>> +     git count-objects -v 2>stderr &&
>> +     grep "^warning:" stderr | sort >actual &&
>> +     cat >expected <<\EOF &&
>> +warning: garbage found: .git/objects/pack/fake2.foo
>> +warning: no corresponding .idx or .pack: .git/objects/pack/foo.keep
>> +warning: no corresponding .idx: .git/objects/pack/fake.pack
>> +EOF
>> +     test_cmp expected actual
>> +'
>> +
>>  test_expect_success 'prune .git/shallow' '
>>       SHA1=`echo hi|git commit-tree HEAD^{tree}` &&
>>       echo $SHA1 >.git/shallow &&
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]