Re: [PATCH] grep: Add option --max-line-len

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 23, 2017 at 10:41 AM, Marc-Antoine Ruel <maruel@xxxxxxxxxxxx> wrote:
> This tells git grep to skip files longer than a specified length,

s/files/lines/

> which is often the result of generators and not actual source files.
>
> Signed-off-by: Marc-Antoine Ruel <maruel@xxxxxxxxxxxx>
> ---
> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
> @@ -127,6 +128,10 @@ OPTIONS
> +-M<num>::
> +--max-line-len<num>::
> +       Match the pattern only for line shorter or equal to this length.

This documentation doesn't explain what it means by a line's length.
This implementation seems to take into consideration only the line's
byte count, however, given that displayed lines are normally
tab-expanded, people might intuitively expect that expansion to count
toward the length. A similar question arises for Unicode characters.

Should this option take tab-expansion and Unicode into account?
Regardless of the answer to that question, the documentation should
make it clear what "line length" means.

> diff --git a/builtin/grep.c b/builtin/grep.c
> @@ -796,6 +796,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
>                         N_("case insensitive matching")),
>                 OPT_BOOL('w', "word-regexp", &opt.word_regexp,
>                         N_("match patterns only at word boundaries")),
> +               OPT_INTEGER('M', "max-line-len", &opt.max_line_length,
> +                       N_("ignore lines longer than <n>")),

On this project, it is typical to have only the long form of an option
name when the option is first implemented so as not to squat on one of
the relatively limited number of short option names. Only after it is
seen that an option is popular, does it get a short name. Whether this
actually matters in this case, I don't know, but is something to take
into consideration.

Why the choice of 'M'? Is there precedence in other Git or Unix
commands for that name over, say, 'N' or
<choose-your-favorite-letter>? Is 'M' is for "max"? If so, what would
a short name for --max-depth be (if it ever got one)? (These are
somewhat rhetorical questions, but I'm interested in the answers if
you have any.)

> diff --git a/grep.c b/grep.c
> @@ -1273,6 +1275,8 @@ static int match_line(struct grep_opt *opt, char *bol, char *eol,
> +       if (opt->max_line_length > 0 && eol-bol > opt->max_line_length)
> +               return 0;

If the user specifies "-M0", should that error out or at least warn
the user that the value is non-sensical? What about -1, etc.? (These
are UX-related questions; the implementation obviously doesn't care
one way or the other.)

> diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
> @@ -766,6 +766,11 @@ test_expect_success 'grep -W shows no trailing empty lines' '
> +test_expect_success 'grep skips long lines' '
> +       git grep -M18 -W include >actual &&
> +       test_cmp expected actual
> +'

Meh, what is this actually testing? The output of "git grep -M18 -W
include" is no different that the output of "git grep -W include" in
the test just above this one. And, why is -W in this test at all? Its
presence confuses the reader into thinking that it is somehow
significant. I would have expected this test to look like this:

    cat >expected <<-\EOF
    hello.c:#include <stdio.h>
    EOF

    test_expect_success 'grep -M skips long lines' '
        git grep -M18 include >actual &&
        test_cmp expected actual
    '



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux