Re: [RFC PATCH 0/2] extend --abbrev support to diff-patch format

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2020-08-10 11:27:05-0400, Jeff King <peff@xxxxxxxx> wrote:
> On Mon, Aug 10, 2020 at 08:15:41AM -0700, Junio C Hamano wrote:
> 
> > > 	A lot of those patches couldn't be applied cleanly to old
> > > 	versions of said software, thus requires some changes from
> > > 	developer and they needs to be regenerated from their trimmed
> > > 	tree. Because the archive tree has significantly fewer
> > > 	objects, the abbreviation in the index line is usually shorter
> > > 	than the original patch. Thus, it generates some noise when
> > > 	said developers try to compare the new patch with the original
> > > 	patch if there's an exact file-hunk match.
> > >
> > > 	Make the object name's abbreviation length configurable to
> > > 	lower those noise.
> > 
> > I agree with Peff that with the above as the sole motivating use
> > case, the "--full-index" option is the right approach.  It is a much
> > more robust solution than "--abbrev=16 would be long enough for all
> > project participants to avoid length drift".  IOW these four
> > paragraphs do not argue _for_ this change, at least to me.
> 
> Yeah, that's what I was getting at: if you care about robust
> machine-readability, then the full index is the best solution. Reading
> between the lines, I think the argument may be "using --full-index is
> too long and therefore ugly, so people like the short-ish names but with
> a bit of extra safety".

My argument was people can either easily fetch the patch via HTTP like:

	curl -LO https://github.com/git/git/commit/eb12adc74cf22add318f884072be2071d181abaa.patch

or take it from a mailing list archive, bugzilla, instead of
cloning a full repository. With those options, we can't say,
"we prefer full-index, please send us the patch with full-index
instead".

> 
> There's an extra challenge here, which is that you have to convince the
> sender to use the extra --abbrev option, even though they themselves
> won't be the ones running into the problem when applying.

Not really, since the sender tree is usually larger than the archived
tree, their abbrev is usually long enough, and the receiver will use
--abbrev to lengthen their abbrev to reduce the noise instead.

> But I don't
> think there's an elegant solution to that (we could just bump the
> default abbrev everywhere to 12+, which is enough in practice).
> Though I'm not 100% sure that "git apply" is smart enough to only look
> at blobs (i.e., if "1234abcd" is ambiguous between a tree and a blob,
> ignore the tree since patches always apply to blobs). That might be
> another avenue that would make things more likely to work without
> anybody having to configure anything.
> 
> > On the other hand, I think you could argue that "--full-index" is
> > merely a synonym for "--abbrev=40", and the patch fixes the
> > inconsistency between the object names on the "index" line, which
> > can choose only between the default abbrev length and the full
> > abbrev length, and all the other places we show object names, which
> > uniformly honor the "--abbrev" option.

I think this argument could be a way to go.
In fact, I always try to use --abbrev with diff family because I know
it works with a handful with other tools, (describe, blame), then
I surprise that it doesn't work, and the documentation tells me
`--abbrev` only works with diff-raw and diff-tree header line.

Then, I keep forgetting that documentation, and I tried again.

For now, I filtered out the index line before comparing 2 patches.

> Yeah, I certainly don't mind the extra flexibility between "full" and
> "default" for "index" lines. I do wonder if people want to configure the
> abbreviations for those lines separately from other parts. I don't know
> that I've ever particularly cared about that flexibility, but the fact
> that they were set up separately all those years ago makes me think
> somebody might.

I don't think people particularly care about the index line (and to
the extent, its length) that much, since the default is number is
actually a minimum number, if Git can't differentiate object with that
number of characters, Git will show a longer object names anyway.

I think most people scripts will put a regex for:

	/index [a-z0-9]{7,}\.\.[a-z0-9]{7,} [0-7]{6}/

Or even:

	/index [a-z0-9]+\.\.[a-z0-9]+ [0-7]+/

For the former case, we could change the code in 2/2 to set the minimum
default to DEFAULT_ABBREV instead of MINIMUM_ABBREV?

For the historical case that users put both --full-index and --abbrev
into there scripts, we still keep our promise to not break their
script by always respect --full-index, regardless of --abbrev.

-- 
Danh



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux