Re: [PATCH v3 01/17] Documentation/technical: add cruft-packs.txt

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 22, 2022 at 02:45:16PM -0700, Jonathan Nieder wrote:
> Hi,
>
> Taylor Blau wrote:
> > On Mon, Mar 07, 2022 at 10:03:35AM -0800, Jonathan Nieder wrote:
>
> >> Sorry for the very slow review!  I've mentioned a few times that this
> >> overlaps in interesting ways with the gc mechanism described in
> >> hash-function-transition.txt, so I'd like to compare and see how they
> >> interact.
> >
> > Sorry for my equally-slow reply ;). I was on vacation last week and
> > wasn't following the list closely.
>
> No problem --- thanks for getting back to me.
>
> [...]
> > (After re-reading what you wrote and my response, I think we are saying
> > the exact same thing, but it doesn't hurt to think aloud).
>
> Great.  Can the doc cover this?  I think it would be helpful to make
> that easy to find for others with similar questions.

I believe the doc covers this already, see the paragraph beginning with
"Unreachable objects aren't removed immediately...".

> >> Can this doc say a little about how "git prune" handles these files?
> >> In particular, does a non cruft pack aware copy of Git (or JGit,
> >> libgit2, etc) do the right thing or does it fight with this mechanism?
> >> If the latter, do we have a repository extension (extensions.*) to
> >> prevent that?
> >
> > I mentioned this in much more detail in [1], but the answer is that the
> > cruft pack looks like any other pack, it just happens to have another
> > metadata file (the .mtimes one) attached to it. So other implementations
> > of Git should treat it as they would any other pack. Like I mentioned in
> > [1], cruft packs were designed with the explicit goal of not requiring a
> > repository extension.
>
> Sorry, the above seems like it's answering a different question than I
> asked.  The doc in Documentation/technical/ seems like a natural place
> to describe what semantics the new .mtimes file has, and I didn't find
> that there.  Is there a different piece of documentation I should have
> been looking at?

Are you looking for a technical description of the mtimes file? If so,
there is a section in Documentation/technical/pack-format.txt (added in
"pack-mtimes: support reading .mtimes files") that explains this.

> Can you tell me a little more about why we would want _not_ to have a
> repository format extension?  To me, it seems like a fairly simple
> addition that would drastically reduce the cognitive overload for
> people considering making use of this feature.

There is no reason to prevent a pre-cruft packs version of Git from
reading/writing a repository that uses cruft packs, since the two
versions will still function as normal. Since there's no need to prevent
the old version from interacting with a repository that has cruft packs,
we wouldn't want to enforce an unnecessary boundary with an extension.

> [...]
> > The key advantage of cruft packs is that you can expire unreachable
> > objects in piecemeal while still retaining the benefit of being able to
> > de-duplicate cruft objects and store them packed against each other.
>
> Can you say a little more about this?  My experience with the similar
> feature in JGit is that it has been helpful to be able to expire a
> cruft pack altogether; since objects that became reachable around the
> same time get packed at the same time, it's not obvious to me what
> benefit this extra piecemeal capability brings.
>
> That doesn't mean the benefit doesn't exist, just that it seems like
> there's a piece of context I'm still missing.

Expiring objects in piecemeal is somewhat interesting, but I think I was
reaching a little too far when I said it was the "key benefit". It does
have some nice properties, like being able to store cruft objects as
deltas against other cruft objects which might get pruned at a different
time (though, of course, you'll need to re-delta them in the case you do
prune an object which is the base of another cruft object).

But the issue with having multiple cruft packs is that the semantics get
significantly more complicated. E.g., if you have an object represented
in multiple cruft packs, which mtime do you use? If you want to prune
it, you suddenly may have many packs you need to update and keep track
of.

> >>> +Notable alternatives to this design include:
> >>
> >> This doesn't mention the approach described in
> >> hash-function-transition.txt (and that's already implemented and has
> >> been in use for many years in JGit's DfsRepository).  Does that mean
> >> you aren't aware of it?
> >
> > Implementing the UNREACHABLE_GARBAGE concept from
> > hash-function-transition.txt in cruft pack-terms would be equivalent to
> > not writing the mtimes file at all. This follows from the fact that a
> > pre-cruft packs implementation of Git considers a packed object's mtime
> > to be the same as the pack it's contained in. (I'm deliberately
> > avoiding any details from the h-f-t document regarding re-writing
> > objects contained in a garbage pack here, since this is separate from
> > the pack structure itself (and could easily be implemented on top of
> > cruft packs)).
> >
> > So I'm not sure what the alternative we'd list would be, since it
> > removes the key feature of the design of cruft packs.
>
> Sorry, I don't understand this answer either.  Do you mean to say that
> JGit's DfsRepository does not in fact have a cruft packs like feature
> that is live in the wild?  Or that that feature is equivalent to not
> having such a feature?  Or something else?
>
> To be clear, I'm not trying to say that that's superior to what you've
> proposed here --- only that documenting the comparison would be
> useful.

I'm not familiar enough with JGit (or its DfsRepository class) to know
how to answer this. I was comparing cruft packs to the
UNREACHABLE_GARBAGE concept mentioned in the hash-function-transition
doc, and noting the differences there.

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux