Re: tb/cruft-packs (was Re: What's cooking in git.git (Mar 2022, #01; Thu, 3))

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/7/2022 3:18 PM, Jonathan Nieder wrote:
> Derrick Stolee wrote:
>> On 3/7/2022 1:18 PM, Taylor Blau wrote:
>>> On Mon, Mar 07, 2022 at 10:06:00AM -0800, Jonathan Nieder wrote:
> 
>>>>  2. Marking this as a repository format extension so it doesn't interact
>>>>     poorly with Git implementations (including older versions of Git
>>>>     itself) that are not aware of the new feature
>>>
>>> The design of cruft packs was done intentionally to avoid needing a
>>> format extension. The cruft pack is "just a pack" to any older version
>>> of Git. The only thing an older version of Git wouldn't understand is
>>> how to interpret the .mtimes file. But that's no different than the
>>> current behavior without cruft packs, where any unreachable object
>>> inherits the mtime of its containing pack.
>>>
>>> So an older version of Git might prune a different set of objects than a
>>> version that understands cruft packs depending on the contents of the
>>> .mtimes file, the mtime of the cruft pack, and the width of the grace
>>> period. But I think by downgrading you are more or less buying into the
>>> existing behavior. So I don't think there is a compelling reason to
>>> introduce a format extension here.
>>
>> In particular, older versions would first explode unreachable objects
>> out of the cruft pack and into loose objects before expiring any of
>> them based on the loose object mtime. There is no risk here of causing
>> problems with older versions of Git and does not need an extension.
> 
> Surely when older and versions are acting on the same repository, they
> would fight by exploding out unreachable objects, packing them back
> into a cruft pack, etc, no?

You are referring to a situation where there are multiple possible
versions responsible for maintaining a repository. Git does not
support parallel writers doing significant updates like full
repacks and GCs and instead relies on the user to control the
concurrency there. The standard we keep to is that parallel readers
can still access the repo during this time.

If someone was running a case where they had these parallel
maintenance processes, then they would already be risking failure
with existing features (though actually in the case of the old
versions breaking the new ones): what if the new/old versions
differ in their understanding of the commit-graph? The old one
could remove commits but not update the commit-graph, leaving
extra commits in that file that the new one would fail to verify.
How about the multi-pack-index? The new version would try loading
objects from missing pack-files since the old version deleted
those packs without updating the multi-pack-index.

At least in the cruft packs the worst case is that no objects are
ever expired because they are toggling between loose objects and
cruft packs.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux