Re: [RFC PATCH] grep: provide sane default to grep_source struct

Emily Shaffer <emilyshaffer@xxxxxxxxxx> · Thu, 16 May 2019 14:05:58 -0700

On Wed, May 15, 2019 at 08:11:52PM -0700, Jonathan Nieder wrote:
> Hi,
> 
> Emily Shaffer wrote:
> 
> > grep_buffer creates a struct grep_source gs and calls grep_source()
> > with it. However, gs.name is null, which eventually produces a
> > segmentation fault in
> > grep_source()->grep_source_1()->show_line() when grep_opt.status_only is
> > not set.
> 
> Thanks for catching it.  Taking a step back, I think the problem is in
> the definition of "struct grep_source":
> 
> 	struct grep_source {
> 		char *name;
> 
> 		enum grep_source_type {
> 			GREP_SOURCE_OID,
> 			GREP_SOURCE_FILE,
> 			GREP_SOURCE_BUF,
> 		} type;
> 		void *identifier;
> 
> 		...
> 	};
> 
> What is the difference between a 'name' and an 'identifier'?  Who is
> responsible for free()ing them?  Can they be NULL?  This is pretty
> underdocumented for a public type.
> 
> If we take the point of view that 'name' should always be non-NULL,
> then I wonder:
> 
> - can we document that?
> - can grep_source_init enforce that?

Today grep_source_init() defaults to NULL. So if we decide that 'name'
should be non-NULL it will be somewhat changing the intent.

	void grep_source_init(struct grep_source *gs, enum grep_source_type type,        
	                      const char *name, const char *path,                        
	                      const void *identifier)                                    
	{                                                                                
	        gs->type = type;                                                         
	        gs->name = xstrdup_or_null(name); 
	...

> - can we take advantage of that in grep_source as well, as a sanity
>   check that the grep_source has been initialized?
> - while we're here, can we describe what the field is used for
>   (prefixing output with context before a ":", I believe)?

In general the documentation for grep.[ch] is pretty light. There aren't
any header comments and `Documentation/technical/api-grep.txt` is a
todo. So I agree that we should document it anywhere we can.

> > Jonathan Nieder proposed alternatively adding some check to grep_source()
> > to ensure that if opt->status_only is unset, gs->name must be non-NULL
> > (and yell about it if not), as well as some extra comments indicating
> > what assumptions are made about the data coming into functions like
> > grep_source(). I'm fine with that as well (although I'm not sure it
> > makes sense semantically to require a name which the user probably can't
> > easily set, or else ban the user from printing LOC during grep). Mostly
> > I'm happy with any solution besides a segfault with no error logging :)
> 
> Let's compare the two possibilities.
> 
> The advantage of "(in memory)" is that it Just Works, which should
> make a nicer development experience with getting a new caller mostly
> working on the way to getting them working just the way you want.
> 
> The disadvantage is that if we start outputting that in production, we
> and static analyzers are less likely to notice.  In other words,
> printing "(in memory)" is leaking details to the end user that do not
> match what the end user asked for.  NULL would instead produce a
> crash, prompting the author of the caller to fix it.
> 
> What was particularly pernicious about this example is that the NULL
> dereference only occurs if the grep has a match.  So I suppose I'm
> leaning toward (in addition to adding some comments to document the
> struct) adding a check like
> 
> 	if (!gs->name && !opt->status_only)
> 		BUG("grep calls that could print name require name");
> 
> to grep_source.

Why not both? :)

But seriously, I am planning to push a second patch with both, per
Junio's reply.

I'll consider the documentation out of scope for now since I'm not sure
I know enough about grep.[ch]'s intent or history to document anything
yet. :)

> 
> That would also sidestep the question of whether this debugging aid
> should be translated. :)
> 
> Sensible?
> 
> Thanks,
> Jonathan