Re: [PATCH v2] grep: add --max-count command line option

"Carlos L." <00xc@xxxxxxxxxxxxxx> · Wed, 22 Jun 2022 13:23:04 +0000

Hi,

On Wednesday, June 22nd, 2022 at 12:22, Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote:

> On Tue, Jun 21 2022, Carlos L. via GitGitGadget wrote:
>
> > From: =?UTF-8?q?Carlos=20L=C3=B3pez?= 00xc@xxxxxxxxxxxxxx
> >
> > This patch adds a command line option analogous to that of GNU
> > grep(1)'s -m / --max-count, which users might already be used to.
>
>
> Thanks, this seems useful.
>
> > This makes it possible to limit the amount of matches shown in the
> > output while keeping the functionality of other options such as -C
> > (show code context) or -p (show containing function), which would be
> > difficult to do with a shell pipeline (e.g. head(1)).
>
>
> We start multi-threaded grep workers, how does this code handle races
> between them finding things, this count being incremented, and the "do
> we have sufficient results?" check?
>
> Is it guarded by the relevant mutexes?

AFAICT only a single thread runs on each file via grep_source_1(), and we check `count`, which is local to this function.

> > + /*
> > + * Optimize out the case where the amount of matches is limited to zero.
> > + * We do this to keep results consistent with GNU grep(1).
> > + */
> > + if (opt.max_count == 0)
> > + exit(EXIT_FAILURE);
>
>
> Don't use exit() in cmd_grep(), you should use "return 1".

I'll use return in my follow-up patch, this can be improved afterwards.

> But even better use usage_msg_opt() here, i.e. inform the user why this
> was bad.
>
> Or hrm, it seems GNU grep silently returns 1 here, perhaps --max-count=0
> is a feature for some?
>
> If this is intentional it's worth documenting and testing it explicitly.

I will add a sentence about this in Documentation/git-grep.txt.

> Re the comments from others about size_t or whatever, it might be better
> here to use OPT_CALLBACK and an unsigned type.
>
> Then just have a "int have_max_count:1", which IMO is more obvious than
> using integer wrap-around to test "didn't provide this flag".

FWIW, I think it's fine to use int and a negative value as an special encoding, max_depth does the same thing. These are per-file matches, so they should not go over 2 billion in reasonable use cases.