Re: [PATCH v2] rebase --merge: optionally skip upstreamed commits

Elijah Newren <newren@xxxxxxxxx> · Wed, 18 Mar 2020 13:41:23 -0700

On Wed, Mar 18, 2020 at 12:55 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes:
>
> >> Hmph, what was it called earlier?  My gut reaction without much
> >> thinking finds --no-skip-* a bit confusing double-negation and
> >> suspect "--[no-]detect-cherry-pick" (which defaults to true for
> >> backward compatibility) may feel more natural, but I suspect (I do
> >> not recall details of the discussion on v1) it has been already
> >> discussed and people found --no-skip-* is OK (in which case I won't
> >> object)?
> >
> > It was earlier called "--{,no-}skip-already-present" (with the opposite
> > meaning, and thus, --skip-already-present is the default), so the double
> > negative has always existed. "--detect-cherry-pick" might be a better
> > idea...I'll wait to see if anybody else has an opinion.
>
> While "--[no-]detect-cherry-pick" is much better in avoiding double
> negation, it is a horrible name---we do not tell the users what we
> do after we detect cherry pick ("--[no-]skip-cherry-pick-detection"
> does not tell us, either).

I like --[no-]detect-cherry-pick.  I'm on board with using "keep"
instead of "skip" to avoid double negation.

> Compared to them, "--[no-]skip-already-present" is much better, even
> though there is double negation.

This one seems especially bad from a discoverability and
understandability viewpoint though.  It's certainly nice if options
are fully self-documenting, but sometimes that would require full
paragraphs for the option name.  Focusing on what is done with the
option at the expense of discovering which options are relevant to
your case or at the expense of enabling users to create a mental model
for when options might be meaningful is something that I think is very
detrimental to usability.  I think users who want such an option would
have a very hard time finding this based on its name, and people who
want completely unrelated features would be confused enough by it that
they feel compelled to read its description and attempt to parse it
and guess how it's related.  In contrast, --[no-]detect-cherry-pick is
a bit clearer to both groups of people for whether it is useful, and
the group who wants it can read up the description to get the details.

> How about a name along the lines of "--[no-]keep-duplicate", then?

This name is much better than --[no-]keep-already-present would be
because "duplicate" is a far better indicator than "already-present"
of the intended meaning.  But I'm still worried the name "duplicate"
isn't going to be enough of a clue to individuals about whether they
will need this options or not.  Perhaps --[no-]keep-cherry-pick?

> >> I also wonder if --detect-cherry-pick=(yes|no|auto) may give a
> >> better end-user experience, with "auto" meaning "do run patch-ID
> >> based filtering, but if we know it will be expensive (e.g. the
> >> repository is sparsely cloned), please skip it".  That way, there
> >> may appear other reasons that makes patch-ID computation expensive
> >> now or in the fiture, and the users are automatically covered.
> >
> > It might be better to have predictability, and for "auto", I don't know
> > if we can have a simple and explainable set of rules as to when to use
> > patch-ID-based filtering - for example, in a partial clone with no
> > blobs, I would normally want no patch-ID-based filtering, but in a
> > partial clone with only a blob size limit, I probably will still want
> > patch-ID-based filtering.
>
> Perhaps.  You could have something more specific than "auto".  The
> main point was instead of "--[no-]$knob", "--$knob=(yes|no|...)" is
> much easier to extend.  I simply do not know if we will see need to
> extend the vocabulary in the near future (to which you guys who are
> more interested in sparse clones would have much better insight than
> I do).

I also struggle to understand when auto would be used.  But beyond
that, I'm still a little uneasy with where we seem to be ending up
(even if no fault of this patch):

1) Behavior has long been --keep-cherry-pick, and in various cases
that behavior can reduce conflicts users have to deal with.
2) Both Junio and I independently guessed that the cherry-pick
detection logic is poorly performing and could be improved; Jonathan
confirmed with some investigative work.  We've all suggested punting
for now, though.
3) I think we can make the sequencer machinery fast enough that the
cherry-pick detection is going to be the slowest part by a good margin
even in normal cases, not just sparse clones or the cases Taylor or I
had in mind.  So I think it's going to stick out like a sore thumb for
a lot more people (though maybe they're all happy because it's faster
overall?).
4) Jonathan provided some good examples of cases where the
--keep-cherry-pick behavior isn't just slow, but leads to actually
wrong answers (a revert followed by an un-revert).

I particularly don't like the idea of something being the default when
it can both cause wrong behavior and present a huge performance
problem that folks have to learn to workaround, especially when based
only on the tradeoff of sometimes reducing the amount of work we push
back on the user.  Maybe that's just inevitable, but does anyone have
any words that will make me feel better about this?

Elijah