Re: [PATCH] doc: revisions: improve single range explanation

Felipe Contreras <felipe.contreras@xxxxxxxxx> · Tue, 15 Jun 2021 06:53:37 -0500

Elijah Newren wrote:
> On Sun, Jun 13, 2021 at 10:09 AM Felipe Contreras
> <felipe.contreras@xxxxxxxxx> wrote:
> > Elijah Newren wrote:

> > > I tend to agree with Eric.  I think the example you chose is likely to
> > > be misinterpreted and your wording magnifies it.  A..F B..E simplifies
> > > to B..F which is *almost* the union of A..F and B..E, it's only
> > > missing A.  Off-by-one errors are easy to miss.
> >
> > Yes, but right before it's explained that the ending point is F.
> > Not E, F.
> 
> I think this is somewhat of a useless distinction -- not for the end
> result, but in terms of helping users understand.  We started adding
> an explanation to the manual because users misunderstand how
> "start1..end1 start2..end2" is treated and we want to correct their
> misunderstandings.  In that context, the only misunderstanding I can
> think of that is dispelled by specifying F is the endpoint would be
> "two ranges are intersected to get the range of commits that log will
> operate on".  I've never seen users assume that or make such a
> mistake.  I've always seen them assume that the "two ranges are
> combined with a union".

Then that warrants yet another paragraph, because this one is for:

  Commands that are specifically designed to take two distinct ranges
  (e.g. "git range-diff R1 R2" to compare two ranges) do exist, but they
  are exceptions.

Probably outside the section of Dotted Range Notations, because if the
user is confused about what 'C B A A' should do, that has nothing to do
with this dotted ranges.

Maybe after the user has been told that:

  Specifying several revisions means the set of commits reachable from
  any of the given commits.

  A commit's reachable set is the commit itself and the commits in
  its ancestry chain.

> > > You make it more likely that they'll miss it, because there are only 6
> > > commits total in the union, and you are trying to explain why listing
> > > A..F B..E while not be 8 commits, which readers can easily respond
> > > with, "Well, of course it's not 8 commits.  There's only 6.
> >
> > If the reader understands that no more than 6 commits can be returned,
> > then the reader has understood the point that the operation is not
> > addition.
> 
> Who in the world ever assumes that "two dotted ranges are combined via
> list addition"?

I don't know, but that is the paragraph we are on:

  Commands that are specifically designed to take two distinct ranges
  (e.g. "git range-diff R1 R2" to compare two ranges) do exist, but they
  are exceptions.

If you are arguing for the removal of this entire paragraph and its
examples, I'd be fine with that.

> I've only ever come across users assuming the
> operation is a union (or, equivalently, addition on sets).  I don't
> understand why you even try to make that point, and think it's a
> distraction that does more harm than good.

If you think it's impossible for the user to assume two dotted ranges
means addition, please explain what is the point of this sentence:

  Unless otherwise noted, all "git" commands that operate on a set of
  commits work on a single revision range.

> > > When you do the union operation, of course the duplicates go away",
> > > and miss the actual point that A got excluded.
> >
> > But that is not the point. This is the point:
> >
> >   Unless otherwise noted, all git commands that operate on a set of
> >   commits work on a single revision range.
> >
> > You are missing the forest for the trees.
> 
> I think you are missing the boat.
> 
> That sentence on its own is completely insufficient to dispel the
> misunderstanding.

One misunderstanding, perhaps, not the one we are trying to tackle
here.

> All that sentence says to users is that if they specify what they
> think of as "two ranges" that we'll somehow treat it as one;

Didn't you just said the user would never think it's actually two
ranges?

What's the point in saying that if the user already knows it?

> but since users are prone to think that "revision range" is
> interchangeable with "set of revisions" (especially since we defined
> A..B elsewhere in set operations), this will merely make them think in
> terms of what set operation they need to perform on the "two ranges"
> to get the set of commits the operation will function on.

That belongs in a separate paragraph.

> The example you provide should attempt to help explain why that mental
> model is mistaken and provide them with a corrected one.  Your
> response to Eric suggests you're not even trying to provide a
> corrected mental model, and your response here suggests you are trying
> to only correct mistakes of the form "take two revision ranges and add
> them keeping duplicates" and "take two revision ranges and intersect
> them", neither of which I've observed in the wild.

I'm providing an example for the paragraph that is already written.

If you want me to rewrite the entire section I can certainly give it a
try.

> Commands that are specifically designed to take two distinct ranges
> (e.g. "git range-diff R1 R2" to compare two ranges) do exist, but they
> are exceptions.  Unless otherwise noted, all git commands that operate
> on a set of commits work on a single revision range.

Isn't this obvious for all users?

> Thus, just as "A..B" translates to "^A B", the expression "A..B C..D"
> translates to "^A B ^C D", i.e. all commits reachable from either B or
> D, as long as they are not reachable from either A or C.

How about we remove the entire paragraph and replace it with:

  When specifying two ranges, such as 'A..B C..D', the way this is
  interpreted is as a single range '^A B ^C D', that is: all commits
  reachable from either B or D, as long as they are not reachable from
  either A or C. Assuming a linear history, B would be reachable from C,
  so this is the same as '^C D'.

-- 
Felipe Contreras