Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for your thoughtful response. I answered in detail below, but I
think that we're in agreement about the semantics, with a few
corrections on my part. I'd like to push forward with this series,
including the proposed changes below, but feel that sending it as v7
would be asking too much of a reviewer. Would it be OK if I sent this a
new series entirely and we abandon this thread?

On Fri, May 18, 2018 at 03:27:44PM +0900, Junio C Hamano wrote:
> Taylor Blau <me@xxxxxxxxxxxx> writes:
>
> >   * `git grep --and -e foo -e bar`. This binary operation should recur
> >     on its sub-expressions and take the minimum of the starting offset
> >     and the maximum of the ending offset.
>
> We use infix notation, so the above is "git grep -e foo --and -e
> bar" actually ;-).

Thanks for catching that :-).

> But you raise an interesting point.  A line with "hello foo bar baz"
> on it would match, so does a line with "goodbye bar baz foo", as
> both of them hits pattern "foo" *and* pattern "bar".  It is not
> quite clear what it means to "show the first hit on the line".  One
> interpretation would be to take the minimum span that makes both
> sides of "--and" happy (your "minimum of start, maximum of end").

It's funny you should mention: this was nearly the exact phrase I used
when speaking with Peff.

> Another might be to pick "foo" in the first and "bar" in the second
> line, as that is the "first hit" on the line, which is consistent
> with how "git grep -e foo" would say about "a foo b foo c foo" (I
> expect that the leftmost "foo" would be the first hit).  So there
> may be multiple, equally plausible answer to the question.

This is the largest fact in my mind pertaining to this discussion: there
are probably many different interpretations of semantics for this, all
equally valid in their own way. I am partial to the minimum substring
interpretation (which follows naturally from the minimum-start,
maximum-end idea), accepting the shortcoming that `--or` sometimes
doesn't ``do the right thing.''

> >     For inputs of the form "foobar" and "foo bar", it will do the right
> >     thing (give the starting and ending offset for "foobar" and give no
> >     match, respectively).
>
> I think I agree with "foobar", but I do not understand why there is
> no match for "foo bar".

Ah, I think this is my mistake -- when I wrote this note last night. The
case of `-e foo --and -e bar` should clearly match both `foo bar` _and_
`foobar`.

> >   * `git grep --or -e foo -e bar`. This is the most complicated case, in
> >     my opinion. In going with the min/max idea in the and case above, I
> >     think that `--or` should also min/max its sub-expressions, but in
> >     fact we short-circuit evaluating the second sub-expression when we
> >     find a match for the first.
>
> I am not sure I follow.  "git grep -e foo --or -e bar" is just a
> longhand for "git grep -e foo -e bar".  Shouldn't it highlight
> whichever between foo and bar that appears leftmost on the line?

I don't believe that the two are treated the same, but I think that this
is another case where I was incorrect in my judgement of the
implementation last night. In fact, the only time when we _don't_ recur
on both sub-expressions of `--or` is when 'collect_hits' is zero. That's
fine, I believe.

> >     So, in cases like matching `--or -e foo -e bar` with "foo baz bar",
> >     we'll do the right thing, since `foo` is the first sub-expression
> >     and happens to be the left-most match. In other words, we __adhere
> >     to our answer with the left-most match first__ semantics, but only
> >     because __the first sub-expression is the left-most match__.
> >
> >     In the other case where we try and match the same expression against
> >     "bar baz foo", we'll return the starting offset of "foo", even
> >     though it isn't the left-most match, violating our semantics.
>
> I am not sure why you think your min-starting/max-ending would lead
> to such a conclusion.  'foo baz bar' would be covered in its
> entirety, 'bar baz foo' would also, as starting of hits with pattern
> 'foo' and pattern 'bar' would be 'b' in 'bar' on that three-word
> line, and ending of hits with these two patterns would be the last
> 'o' in 'foo' on the line.

Right, I think with the understanding in my last stanza of this response
("I don't believe that ...") this issue is resolved, and the
min-starting/max-ending _will_ do the right thing.

> I'd expect that a line 'foo baz bar' matched against "-e foo --or -e
> bar" would say "among three words on me, 'f' in foo is the first
> location of the match", though.

I would, too.

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux