Re: [PATCH] grep: Add option --max-line-len

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Marc-Antoine Ruel <maruel@xxxxxxxxxxxx> writes:

> This tells git grep to skip files longer than a specified length,
> which is often the result of generators and not actual source files.
>
> ...
> +-M<num>::
> +--max-line-len<num>::
> +	Match the pattern only for line shorter or equal to this length.
> +

All the excellent review comments from Eric I agree with.

With the name of the option and the above end-user facing
description, it is very clear that the only thing this feature does
is to declare that an overlong line does _not_ match when trying to
check against any pattern.

That is a much clearer definition and description than random new
features people propose here (and kicked back by reviewers, telling
them to make the specification clearer), and I'd commend you for that.

But it still leaves at least one thing unclear.  How should it
interact with "-v"?  If we consider an overlong line never matches,
would "git grep -v <pattern>" should include the line in its output?

Speaking of the output, it also makes me wonder if the feature
really wants to include an overlong line as a context line when
showing a near-by line that matches the pattern when -A/-B/-C/-W
options are in use. Even though it is clear that it does from the
above description, is it really the best thing the feature can do to
help the end users?

Which leads me to suspect that this "feature" might not be the ideal
you wanted to achive, but is an approximate substitution that you
found is "good enough" to simulate what the real thing you wanted to
do, especially when I go back and read the justfication in the
proposed log message that talks about "result of generators".

Isn't it a property of the entire file, not individual lines, if you
find it uninteresting to see reported by "git grep"?  I cannot shake
the suspicion that this feature happened to have ended up in this
shape, instead of "ignore a file with a line this long", only
because your starting point was to use "has overlong lines" as the
heuristic for "not interesting", and because "git grep" code is not
structured to first scan the entire file to decide if it is worth
working on it, and it is extra work to restructure the codeflow to
make it so (which you avoided).

If your real motivation was either

 (1) whether the file has or does not have the pattern for certain
     class of files are uninteresting; do not even run "grep"
     processing for them; or

 (2) hits or no-hits may be intereseting but output of overlong
     lines from certain class of files I do not wish to see;

then I can think of two alternatives.

For (1), can't we tell "result of generators" and other files with
pathspec?  If so, perhaps a negative pathspec can rescue.  e.g.

    git grep <pattern> -- '*.cc' ':!*-autogen.cc'

For (2), can't we model this after how users can tell "git diff"
that certain paths are not worth computing and showing textual
patches for, which is to Unset the 'diff' attribute?  When you have

    *-autogen.cc	-diff

in your .gitattributes, "git diff" would say "Binary files A and B
differ" instead of explaining line-by-line differences in the patch
form.  Perhaps we can also have a 'grep' attribute and squelch the
output if it is Unset?  

It is debatable but one could propose extending the use of existing
'diff' attribute to cover 'grep' too, with an argument that anything
not worth showing patch (i.e. 'diff' attribute is Unset) is not
worth showing grep hits from.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux