Re: [PATCH] blame: allow -L n,m to have an m bigger than the file's line count

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 10 Feb 2010 10:58:02 -0800

Jay Soffian <jaysoffian@xxxxxxxxx> writes:

> On Wed, Feb 10, 2010 at 2:27 AM, Stephen Boyd <bebarino@xxxxxxxxx> wrote:
>> and get what I want but that isn't very discoverable. If the range is
>> greater than the number of lines just truncate the range to go up to
>> the end of the file.
>
> I agree this is the right thing to do. I'm working on a patch to
> support matching multiple times when given a regex range and made just
> that change as well. :-)

I would mildly suggest against going in that direction.

This is merely "mildly", because truncating 99999 in "-L200,99999" to the
number of lines in the target blob would _not_ hurt.  But it is an ugly
hack.  I would be Ok with coding that special case, but I do not want to
see it advertised, especially if we are making "omission defaults to the
end" the documented way to explicitly say "I don't care to count, just do
it til the end".

While we are talking about touching the vicinity, I think we should
tighten the -L s,e parsing rules a bit further.

There is an undocumented code that swaps start and end if the given end is
smaller than the start.  This triggers even when "-L280,300" is mis-typed
as "-L280,30".  I was bitten by this more than once---when the input does
not make sense, we should actively error out, instead of doing a wrong
thing.  I suspect that I coded it that way _only_ to support this pattern:

	-L'/^#endif \/\* !WINDOWS \/\*/,-30'

i.e. "blame 30 lines before the '#endif' line".  But the code also
internally turns "-L50,-20" into "-L 50,30" and then swaps them to may
make it "-L30,50"; this was merely an unintended side effect.

I do want to see -L'/regexp/,-offset' keep working, I do not mind if we
keep taking "-L50,-20" as an unintuitive way to spell "-L30,50", or reject
"-L50,-20" as a nonsense.  But I do want to see us reject "-L280,30" as a
typo.

As to use of more than one -L option, especially when the start (or end
for that matter) is specified with an regexp, I am of two minds.

When annotating the body of two functions, frotz and nitfol, I might
expect this to work:

    -L'/^int frotz(/,/^}/'  -L'/^int nitfol(/,/^}/'

regardless of the order these functions appear in the blob (i.e. nitfol
may be defined first).  This requires that parsing of "regexp" needs to
reset to the beginning of blob for each -L option (iow, multiple -L are
parsed independently from each other).

But at the same time, if I am actually looking at the blob contents in one
terminal while spelling the blame command line in another, it would be
nicer if the multiple -L looked for patterns incrementally.  I may
appreciate if I can write the above command line as:

    -L'/^int frotz(/,/^}/'  -L'/nitfol/,/^}/'

when I can see in my "less" of the blob contents in the other terminal
that the first line that has string "nitfol" after the end of the
definition of "frotz" is the beginning of function "nitfol".

Another thing we _might_ want to consider doing is something like:

    -L'*/^#ifdef WINDOWS/,/^#endif \/\* WINDOWS \/\*/'

to tell it "I don't care to count how many WINDOWS ifdef blocks there are;
grab all of them".

Regardless of how parsing of multiple -L goes, you need to be careful to
sort the resulting line ranges and possibly coalesce them when there are
overlaps (e.g. "-L12,+7 -L10,+5" should become "-L10,17").  And be careful
about refcounting of origin.  You'll be making multiple blame_ent and
queuing them to the scoreboard when starting, all pointing at the blame
target blob; the origin blob needs to start with the right number of
references to keep origin_decref() discarding it.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html