Re: [PATCH 05/24] midx: implement `DISP` chunk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Dec 03, 2023 at 10:15:11PM +0900, Junio C Hamano wrote:
> Taylor Blau <me@xxxxxxxxxxxx> writes:
>
> > diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
> > index 3696506eb3..d130e65b28 100644
> > --- a/Documentation/git-multi-pack-index.txt
> > +++ b/Documentation/git-multi-pack-index.txt
> > @@ -49,6 +49,10 @@ write::
> >  	--stdin-packs::
> >  		Write a multi-pack index containing only the set of
> >  		line-delimited pack index basenames provided over stdin.
> > +		Lines beginning with a '+' character (followed by the
> > +		pack index basename as before) have their pack marked as
> > +		"disjoint". See the "`DISP` chunk and disjoint packs"
> > +		section in linkgit:gitformat-pack[5] for more.
>
> Makes one wonder who computes the set of packfiles, decides to
> prefix '+' to which ones, and how it does so, none of which appear
> in this step (which is understandable).  As the flow of information
> is from the producer of individual "disjoint" packs (not in this
> step) to this new logic in "--stdin-packs" to the new "DISP" chunk
> writer (the primary focus of this step) to the final consumer of
> "DISP" chunk (not in this step), we are digging from the middle
> (hopefully to both directions in other steps).  It is probably the
> easiest way to explain the idea to start from the primary data
> structures and "DISP" seems to be a good place to start.

Thanks. I found that laying out this series was rather tricky, since all
of the individual pieces really depend on the end state in order to make
any sense.

Hopefully you're satisfied with the way things are split up and
organized currently, but if you have suggestions on other ways I could
slice or dice this, please let me know.

> > +	    Two packs are "disjoint" with respect to one another when they have
> > +	    disjoint sets of objects.
> > + In other words, any object found in a pack
> > +	    contained in the set of disjoint packfiles is guaranteed to be
> > +	    uniquely located among those packs.
>
> I often advise people to rethink what they wrote _before_ "In other
> words", because the use of that phrase is a sign that the author
> considers the statement is hard to grok and needs rephrasing, in
> which case, the rephrased version may be a better way to explain the
> concept being presented without the harder-to-grok version.
>
> But I do not think this one is a good example to apply the advice.
> It is because "In other words," is somewhat misused in the sentence.
> Two "disjoint" packs do not store any common object (which is how
> you defined the adjective "disjoint" in the first sentence).  "As a
> consequence"/"Hence", an object found in one pack among many
> "disjoint" packs will not appear in others.

Thanks, I'll replace this with "As a consequence", and try to follow
that general advice more often in the future ;-).

> OK, so it seems they really need to be strictly disjoint in order to
> participate in the reuse of the existing packdata.

I think that's generally true, though there are some exceptions.

I think the real condition here is that the *reused sections* must be
disjoint with respect to one another, not necessarily the packs
themselves. So having the packs be disjoint is a sufficient condition,
since we know that no matter which section(s) we reuse, they are
guaranteed to be disjoint.

I think that there is opportunity to be more clever here, e.g., by
allowing for different disjoint "groups" of packs, or mandating that you
can only reuse certain sections from different combinations of packs in
order to satisfy this property.

That's part of the reason why I left more space than is needed for the
"disjoint" state in the DISP chunk (it is 32 bits, of which we're only
using one of them). I'm not sure that we would want more relaxed
constraints here, since they'd be harder to satisfy. But my hope is that
we would be able to learn from running this in production to figure out
whether or not such a thing would be useful.

> > +  - All objects stored as offset- or reference-deltas also include their
> > +    base object in the resulting pack.
>
> Are thin packs obsolete?

No, I think I should clarify this to make it more obvious that this only
applies to non-thin packs.

> > +		find $objdir/pack -type f -name "*.idx" | xargs -n 1 basename | sort >packs &&
>
> That is an overly-long line.

Thanks for spotting.

> > +test_expect_success 'non-disjoint packs are detected' '
> > +	test_when_finished "rm -fr repo" &&
> > +	git init repo &&
> > +	(
> > +		cd repo &&
> > +
> > +		test_commit base &&
> > +		git repack -d &&
> > +		test_commit other &&
> > +		git repack -a &&
> > +
> > +		ls -la .git/objects/pack/ &&
>
> Is this line a leftover debugging aid?

Indeed, thanks.

> > +		find $objdir/pack -type f -name "*.idx" |
> > +			sed -e "s/.*\/\(.*\)$/+\1/g" >in &&
>
> Lose "g"; it adds unnecessary cognitive burden to the readers if the
> patterh is expected to match multiple times, and you know that is
> not possible (your pattern is right anchored at the end).  This may
> apply equally to other uses of "sed" in this patch.

Thanks, I dropped the 'g' in both instances.

Thanks,
Taylor




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux