Re: 2.22.0 repack -a duplicating pack contents

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 24 2019, Jeff King wrote:

> On Sun, Jun 23, 2019 at 06:08:25PM +0000, Eric Wong wrote:
>
>> > I'm not sure of the right solution. For maximal backwards-compatibility,
>> > the default for bitmaps could become "if not bare and if there are no
>> > .keep files". But that would mean bitmaps sometimes not getting
>> > generated because of the problems that ee34a2bead was trying to solve.
>> >
>> > That's probably OK, though; you can always flip the bitmap config to
>> > "true" yourself if you _must_ have bitmaps.
>>
>> What about something like this?  Needs tests but I need to leave, now.
>
> Yeah, I think that's the right direction.
>
> Though...
>
>> +static int has_pack_keep_file(void)
>> +{
>> +	DIR *dir;
>> +	struct dirent *e;
>> +	int found = 0;
>> +
>> +	if (!(dir = opendir(packdir)))
>> +		return found;
>> +
>> +	while ((e = readdir(dir)) != NULL) {
>> +		if (ends_with(e->d_name, ".keep")) {
>> +			found = 1;
>> +			break;
>> +		}
>> +	}
>> +	closedir(dir);
>> +	return found;
>> +}
>
> I think this can be replaced with just checking p->pack_keep for each
> item in the packed_git list.
>
> That's racy, but then so is your code here, since it's really the child
> pack-objects which is going to deal with the .keep. I don't think we
> need to care much about the race, though. Either:
>
>   1. Somebody has made an old intentional .keep, which would not be
>      racy. We'd see it in both places.
>
>   2. Somebody _just_ made an intentional .keep; we'll race with that and
>      maybe duplicate objects from the kept pack. But this is a rare
>      occurrence, and there's no real ordering promise here anyway with
>      somebody creating .keep files alongside a running repack.
>
>   3. An incoming fetch/push may create a .keep file as a temporary lock,
>      which we see here but which goes away by the time pack-objects
>      runs. That's OK; we err on the side of not generating bitmaps, but
>      they're an optimization anyway (and if you really insist on having
>      them, you should tell Git to definitely make them instead of
>      relying on this default behavior).

This sort of thing (#3) strikes me as a fairly pathological case we
should try to avoid. Now what we've turned on bitmaps by default people
will take the sort of performance increase noted in [1] for granted.

So they'll be happily running with that & then get a CPU/IO spike as the
*.bitmap files they'd been implicitly relying on for years in their
default config goes away, only to have it re-appear when "repack" runs
next.

I can't think of some great solution for this case, some thoughts:

 a. Perhaps we should split the *.keep flag into two things or
    more.

    We're using it for all of "I want this *.pack forever"
    (e.g. debugging) and "I want only this *.pack to contain the data
    found in it" (I/O & CPU optimization, what Janos wants) and "I'm
    git.git code avoiding a race with myself" (what you describe in #3).

    So maybe for the last of those we could also use and understand
    *.tmp-keep, at which point we wouldn't have this race described in
    #3. The 1st of those is a *.noprune and the 2nd is *.highlander (but
    whether it's worth splitting all that out v.s. just having
    *.tmp-keep is another matter).

 b) Shouldn't we at least print some warning to STDERR in this case so
    e.g. gc.log will note the performance degradation of the repo in its
    current configuration?

>   4. Like (3), but we _don't _see the temporary .keep here but _do_ see
>      it during pack-objects. That's OK, because we'll have told
>      pack-objects to pack those objects anyway, which is the right
>      thing.
>
> -Peff

1. https://github.blog/2015-09-22-counting-objects/



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux