Re: [PATCH v2 1/3] range-diff/format-patch: refactor check for commit range

Uwe Kleine-König <u.kleine-koenig@xxxxxxxxxxxxxx> · Mon, 25 Jan 2021 22:25:25 +0100

Hello Junio,

On Mon, Jan 25, 2021 at 11:24:39AM -0800, Junio C Hamano wrote:
> Uwe Kleine-König <u.kleine-koenig@xxxxxxxxxxxxxx> writes:
> 
> >> > In preparation for allowing more sophisticated ways to specify commit
> >> > ranges, let's refactor the check into its own function.
> >> 
> >> I think the sharing between the two makes sense, but the helper
> >> function should make it clear in its name that this is "the kind of
> >> commit range range-diff wants to take".  Among the commit range "git
> >> log" and friends can take, range-diff can take only a subset of it,
> >> and only a subset of it is meaningful to range-diff (e.g. HEAD^@ is
> >> still a commit range you can give to "git log", but it would not
> >> make much sense to give it to range-diff).
> >
> > Does it make so little sense to forbid passing HEAD^@ as a range to
> > range-diff? I can imagine situations where is would make sense, e.g. I
> > often create customer patch stacks from a set of topic branches using
> > octopus merge. To compare two of these ^@ might be handy.
> 
> You can discuss for each individual syntax of a single-token range
> and decide which ones are and are not suitable for range-diff, but I
> suspect the reason behind this business started with dot-dot is to
> perform a superficial "sanity check" at the command line parser
> level before passing them to the revision machinery, and having to
> deal with potential errors and/or having to compare unreasonably
> large segments of history that the user did not intend.
> 
> Also I first thought that the command changes the behaviour, given
> two tokens, depending on the shape of these two tokens (i.e. when
> they satisfy the "is-range?" we are discussing, they are taken as
> two ranges to be compared, and otherwise does something else), but
> after revisiting the code and "git help range-diff", it always does
> one thing when given 
> 
>  (1) one arg: gives a symmetric range and what is to be compared
>      is its left and right half,
> 
>  (2) two args: each is meant to name a set of commits and these two
>      are to be compared) or
> 
>  (3) three args: each is meant to name a commit, and the arg1..arg2
>      and arg1..arg3 are the ranges to be compared.
> 
> so ...
> 
> > My POV is that if it's easy to use the same function (and so the same
> > set of range descriptors) for git log and git range-diff then do so.
> > This yields a consistent behaviour which is IMHO better than preventing
> > people to do things that are considered strange today.
> 
> ... I am OK with that point of view.  It certainly is simpler to
> explain to end users.

It seems you understood my argument :-)

> Having said that, it would make it much harder to implement
> efficiently, though.  For example, when your user says
> 
> 	git range-diff A B
> 
> to compare "git log A" (all the way down to the root) and "git log
> B" (likewise), you'd certainly optimize the older common part of the
> history out, essentially turning it into
> 
> 	git range-diff A..B B..A
> 
> or its moral equivalent
> 
> 	git range-diff A...B
> 
> But you cannot apply such an optimization blindly.  When the user
> gives A..B and B..A as two args, you somehow need to notice that 
> you shouldn't rewrite it to "A..B...B..A", and for that, you'd still
> need some "parsing" of these args.

I agree that for a long history

	git range-diff A B

is an expensive request and I wouldn't invest too many cycles optimizing
it. (And if I'd optimize it, it wouldn't be done using textual
combination of the two strings but by checking if the two ranges
intersect. This way something like

	git range-diff v4.0..v4.6-rc1 v4.0..v4.5.6

and maybe even

	git range-diff v4.0..v4.6-rc1 v4.0-rc1..v4.5.6

would benefit, too. But note I'm not (anymore) familiar with the git
source code, so I don't know if this is easy/sensible to do and I'm just
looking at the problem from an architectural and theoretical POV.)

> So, I dunno.  Limiting the second form to only forms that the
> implementation does not have to do such optimization would certainly
> make it simpler for Dscho to implement ;-)

I don't want to make it more complicated for Dscho, I'm happy if I can
in the near future use range-diff with $rev1^! $ref2^! . So feel free to
ignore me.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |
Attachment:
signature.asc

Description: PGP signature