Re: [PATCH v2] grep: provide sane default to grep_source struct

Jeff King <peff@xxxxxxxx> · Thu, 16 May 2019 18:02:54 -0400

On Thu, May 16, 2019 at 02:44:44PM -0700, Emily Shaffer wrote:

> grep_buffer creates a struct grep_source gs and calls grep_source()
> with it. However, gs.name is null, which eventually produces a
> segmentation fault in
> grep_source()->grep_source_1()->show_line() when grep_opt.status_only is
> not set.

Funny wrapping here.

> This seems to be unreachable from existing commands but is reachable in
> the event that someone rolls their own revision walk and neglects to set
> rev_info->grep_filter->status_only. Conceivably, someone might want to
> print all changed lines of all commits which match some regex.
> 
> To futureproof, give developers a heads-up if grep_source() is called
> and a valid name field is expected but not given. This path should be
> unreachable now, but in the future may become reachable if developers
> want to add the ability to use grep_buffer() in prepared in-core data
> and provide hints to the user about where the data came from.

So I guess this is probably my fault, as I was the one who factored out
the grep_source bits long ago (though I would also not be surprised if
it could be triggered before, too).

I think adding a BUG() is the right thing to do.

> I've added the BUG() line to grep_source(). I considered adding it to
> grep_source_1() but didn't see much difference. Looks like
> grep_source_1() exists purely as a helper to grep_source() and is never
> called by anybody else... but it's the function where the crash would
> happen. So I'm not sure.

I agree it probably doesn't matter. I'd probably stick at at the top of
grep_source_1(), just with the rationale that it could possibly catch
more cases if somebody ever refactors grep_source() to have two
different callers (e.g., imagine we later add a variant that takes more
options, but leave the old one to avoid disrupting existing callers).

> I also modified the wording for the BUG("") statement a little from
> jrn's suggestion to hopefully make it more explicit, but I welcome
> wordsmithing.
> [...]
> +		BUG("grep call which could print a name requires "
> +		    "grep_source.name be non-NULL");

It looks fine to me. The point is that nobody should ever see this. And
if they do, we should already get a file/line number telling us where
the problem is (and a coredump to poke at). So as long as it's enough to
convince a regular user that they should make a report to the mailing
list, it will have done its job.

> diff --git a/grep.c b/grep.c
> index 0d50598acd..6ceb880a8c 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -2021,6 +2021,9 @@ static int chk_hit_marker(struct grep_expr *x)
>  
>  int grep_source(struct grep_opt *opt, struct grep_source *gs)
>  {
> +	if (!opt->status_only && gs->name == NULL)
> +		BUG("grep call which could print a name requires "
> +		    "grep_source.name be non-NULL");
>  	/*
>  	 * we do not have to do the two-pass grep when we do not check
>  	 * buffer-wide "all-match".

Minor nit, but can we put a blank line between the new block and the
comment?

> @@ -2045,7 +2048,11 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size)
>  	struct grep_source gs;
>  	int r;
>  
> -	grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL);
> +	/* TODO: In the future it may become desirable to pass in the name as
> +	 * an argument to grep_buffer(). At that time, "(in core)" should be
> +	 * replaced.
> +	 */
> +	grep_source_init(&gs, GREP_SOURCE_BUF, _("(in core)"), NULL, NULL);

Hmm. I don't see much point in this one, as it would just avoid
triggering our BUG(). If somebody is adding new grep_buffer() callers
that don't use status_only, wouldn't we want them to see the BUG() to
know that they need to refactor grep_buffer() to provide a name?

-Peff