Re: [PATCH v3 01/17] Documentation/technical: add cruft-packs.txt

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Taylor Blau wrote:
> On Mon, Mar 07, 2022 at 10:03:35AM -0800, Jonathan Nieder wrote:

>> Sorry for the very slow review!  I've mentioned a few times that this
>> overlaps in interesting ways with the gc mechanism described in
>> hash-function-transition.txt, so I'd like to compare and see how they
>> interact.
>
> Sorry for my equally-slow reply ;). I was on vacation last week and
> wasn't following the list closely.

No problem --- thanks for getting back to me.

[...]
> (After re-reading what you wrote and my response, I think we are saying
> the exact same thing, but it doesn't hurt to think aloud).

Great.  Can the doc cover this?  I think it would be helpful to make
that easy to find for others with similar questions.

If it's a matter of finding enough time to write some text, let me
know and I can try to find some time to help.

[...]
>> Can this doc say a little about how "git prune" handles these files?
>> In particular, does a non cruft pack aware copy of Git (or JGit,
>> libgit2, etc) do the right thing or does it fight with this mechanism?
>> If the latter, do we have a repository extension (extensions.*) to
>> prevent that?
>
> I mentioned this in much more detail in [1], but the answer is that the
> cruft pack looks like any other pack, it just happens to have another
> metadata file (the .mtimes one) attached to it. So other implementations
> of Git should treat it as they would any other pack. Like I mentioned in
> [1], cruft packs were designed with the explicit goal of not requiring a
> repository extension.

Sorry, the above seems like it's answering a different question than I
asked.  The doc in Documentation/technical/ seems like a natural place
to describe what semantics the new .mtimes file has, and I didn't find
that there.  Is there a different piece of documentation I should have
been looking at?

Can you tell me a little more about why we would want _not_ to have a
repository format extension?  To me, it seems like a fairly simple
addition that would drastically reduce the cognitive overload for
people considering making use of this feature.

[...]
> The key advantage of cruft packs is that you can expire unreachable
> objects in piecemeal while still retaining the benefit of being able to
> de-duplicate cruft objects and store them packed against each other.

Can you say a little more about this?  My experience with the similar
feature in JGit is that it has been helpful to be able to expire a
cruft pack altogether; since objects that became reachable around the
same time get packed at the same time, it's not obvious to me what
benefit this extra piecemeal capability brings.

That doesn't mean the benefit doesn't exist, just that it seems like
there's a piece of context I'm still missing.

>>> +Notable alternatives to this design include:
>>
>> This doesn't mention the approach described in
>> hash-function-transition.txt (and that's already implemented and has
>> been in use for many years in JGit's DfsRepository).  Does that mean
>> you aren't aware of it?
>
> Implementing the UNREACHABLE_GARBAGE concept from
> hash-function-transition.txt in cruft pack-terms would be equivalent to
> not writing the mtimes file at all. This follows from the fact that a
> pre-cruft packs implementation of Git considers a packed object's mtime
> to be the same as the pack it's contained in. (I'm deliberately
> avoiding any details from the h-f-t document regarding re-writing
> objects contained in a garbage pack here, since this is separate from
> the pack structure itself (and could easily be implemented on top of
> cruft packs)).
>
> So I'm not sure what the alternative we'd list would be, since it
> removes the key feature of the design of cruft packs.

Sorry, I don't understand this answer either.  Do you mean to say that
JGit's DfsRepository does not in fact have a cruft packs like feature
that is live in the wild?  Or that that feature is equivalent to not
having such a feature?  Or something else?

To be clear, I'm not trying to say that that's superior to what you've
proposed here --- only that documenting the comparison would be
useful.

Puzzled,
Jonathan



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux