On Sat, May 12, 2018 at 03:07:04PM +0900, Junio C Hamano wrote: > Taylor Blau <me@xxxxxxxxxxxx> writes: > > > I re-read your note and understand more clearly now what your suggestion > > is. To ensure that we're in agreement, do you mean: > > > > 1. '--column -v' will _never_ give a column, but will never die(), > > either > > No, I don't. > > > 2. '--column --[and | or | not]' will never give a column, but will > > also never die(), either. > > No, I don't. > > If a file does not have substring "foo", then > > git grep -v -e foo file > git grep --not -e foo file > > would hit all lines, just like > > git grep -e '.*' file > > does. > > I would expect that all of these > > git grep --column/-o -v -e foo file > git grep --column/-o --not -e foo file > git grep --column/-o -e '.*' file > > give the same output, which is what we would get if we consider the > hit from "choose lines that lack 'foo'" on a line without 'foo' is > caused by the entire contents on the line. That is in line with > "choose lines that has anything (including nothing)" aka ".*" would > result in the entire line being reported via -o. The byte offset of > the first hit on such a line reported by --column is also 1, and > that is a good and real answer to the question "git grep --column/-o" > can give. I agree with your message now and thank you for explaining what you had written. I spoke with Peff off-list for a while to determine what I think is essentially the answer to ``what are a set of semantics for filling out a regmatch_t given an extended expression?'' It's helpful to recognize that the extended expressions are implemented very much like a tree, so a reasonable semantics will lend itself well to the way in which match_expr_eval() is implemented. Here's what we came up with: * `git grep -e foo`. This is the case where the extended expression has a single atomic node in its tree. This falls into the "just call match_one_pattern()" case and has a simple answer: the starting offset and ending offset are that of whatever match_one_pattern gives. * `git grep --not -e foo`. This has the set of semantics that you describe above (the starting offset is 1), with the addition that the ending offset is the end of the line. This is similar to the fact that `--not foo` is very similar to `.$`. * `git grep --and -e foo -e bar`. This binary operation should recur on its sub-expressions and take the minimum of the starting offset and the maximum of the ending offset. For inputs of the form "foobar" and "foo bar", it will do the right thing (give the starting and ending offset for "foobar" and give no match, respectively). * `git grep --or -e foo -e bar`. This is the most complicated case, in my opinion. In going with the min/max idea in the and case above, I think that `--or` should also min/max its sub-expressions, but in fact we short-circuit evaluating the second sub-expression when we find a match for the first. So, in cases like matching `--or -e foo -e bar` with "foo baz bar", we'll do the right thing, since `foo` is the first sub-expression and happens to be the left-most match. In other words, we __adhere to our answer with the left-most match first__ semantics, but only because __the first sub-expression is the left-most match__. In the other case where we try and match the same expression against "bar baz foo", we'll return the starting offset of "foo", even though it isn't the left-most match, violating our semantics. So, I propose we adopt the following: use the trivial answer for "foo", the whole line for "--not", and min/max the starting/ending offsets for binary operators, knowing that we will sometimes produce a weird answer for --or. I think that the semantics for --or are OK to go forward with, but would be interested in the thoughts of others to figure out whether this is sensible to everyone else. Does this seem like an OK approach? Perhaps Peff can clarify some of what's shared here, since we did speak elsewhere about it. Thanks, Taylor