On Wed, May 15, 2019 at 08:11:52PM -0700, Jonathan Nieder wrote: > Hi, > > Emily Shaffer wrote: > > > grep_buffer creates a struct grep_source gs and calls grep_source() > > with it. However, gs.name is null, which eventually produces a > > segmentation fault in > > grep_source()->grep_source_1()->show_line() when grep_opt.status_only is > > not set. > > Thanks for catching it. Taking a step back, I think the problem is in > the definition of "struct grep_source": > > struct grep_source { > char *name; > > enum grep_source_type { > GREP_SOURCE_OID, > GREP_SOURCE_FILE, > GREP_SOURCE_BUF, > } type; > void *identifier; > > ... > }; > > What is the difference between a 'name' and an 'identifier'? Who is > responsible for free()ing them? Can they be NULL? This is pretty > underdocumented for a public type. > > If we take the point of view that 'name' should always be non-NULL, > then I wonder: > > - can we document that? > - can grep_source_init enforce that? Today grep_source_init() defaults to NULL. So if we decide that 'name' should be non-NULL it will be somewhat changing the intent. void grep_source_init(struct grep_source *gs, enum grep_source_type type, const char *name, const char *path, const void *identifier) { gs->type = type; gs->name = xstrdup_or_null(name); ... > - can we take advantage of that in grep_source as well, as a sanity > check that the grep_source has been initialized? > - while we're here, can we describe what the field is used for > (prefixing output with context before a ":", I believe)? In general the documentation for grep.[ch] is pretty light. There aren't any header comments and `Documentation/technical/api-grep.txt` is a todo. So I agree that we should document it anywhere we can. > > Jonathan Nieder proposed alternatively adding some check to grep_source() > > to ensure that if opt->status_only is unset, gs->name must be non-NULL > > (and yell about it if not), as well as some extra comments indicating > > what assumptions are made about the data coming into functions like > > grep_source(). I'm fine with that as well (although I'm not sure it > > makes sense semantically to require a name which the user probably can't > > easily set, or else ban the user from printing LOC during grep). Mostly > > I'm happy with any solution besides a segfault with no error logging :) > > Let's compare the two possibilities. > > The advantage of "(in memory)" is that it Just Works, which should > make a nicer development experience with getting a new caller mostly > working on the way to getting them working just the way you want. > > The disadvantage is that if we start outputting that in production, we > and static analyzers are less likely to notice. In other words, > printing "(in memory)" is leaking details to the end user that do not > match what the end user asked for. NULL would instead produce a > crash, prompting the author of the caller to fix it. > > What was particularly pernicious about this example is that the NULL > dereference only occurs if the grep has a match. So I suppose I'm > leaning toward (in addition to adding some comments to document the > struct) adding a check like > > if (!gs->name && !opt->status_only) > BUG("grep calls that could print name require name"); > > to grep_source. Why not both? :) But seriously, I am planning to push a second patch with both, per Junio's reply. I'll consider the documentation out of scope for now since I'm not sure I know enough about grep.[ch]'s intent or history to document anything yet. :) > > That would also sidestep the question of whether this debugging aid > should be translated. :) > > Sensible? > > Thanks, > Jonathan