Re: [PATCH v2 6/9] list-objects-filter-options: make filter_spec a strbuf

Matthew DeVore <matvore@xxxxxxxxxxx> · Mon, 10 Jun 2019 17:34:56 -0700

On Mon, Jun 10, 2019 at 01:13:54PM -0700, Junio C Hamano wrote:
> Matthew DeVore <matvore@xxxxxxxxxx> writes:
> 
> > -	filter_options->filter_spec = strdup(core_partial_clone_filter_default);
> > +	if (!filter_options->filter_spec.buf)
> > +		strbuf_init(&filter_options->filter_spec, 0);
> 
> This part made me go "Huh?" a bit.
> 
> Do we document that .buf==NULL means an uninitialized strbuf that is
> safe to run strbuf_init() on?  I do not mind that as a general

Kind of. The first bullet point in strbuf.h says:

 *  - The `buf` member is never NULL, so it can be used in any usual C
 *    string operations safely. strbuf's _have_ to be initialized either by
 *    `strbuf_init()` or by `= STRBUF_INIT` before the invariants, though.

So I extrapolated that if buf is NULL, it must be because it was just xcalloc'd
and not initialized. One possible improvement to the API would be to refactor
it such that there is no STRBUF_INIT, but a zero-initialized strbuf is valid.
If you expect to get a non-NULL buf, even for a zero-initialized strbuf, you
should call a function like strbuf_nonnull_buf(&buf), and that will return the
slop buf if buf is null, or the actual buf if it is non-null.

I don't understand why the API designer was so strict about requiring the
buffer to be set to non-null, since it's quite a burden for API users. If I
eagerly set all filter_options's strbuf's to STRBUF_INIT, it involves changing
a couple of global variables which currently do not need an initializer, and it
would make the code a bit messy. The structs which have a strbuf somewhere in
their nested fields would need to know that, and set up an initialization macro
to avoid the null buf.

I kind of suspect the right short-term fix is to avoid strbuf's and use a
string_list, which I join later to a full string when needed.

> convention, and it may even be a useful one (i.e. it allows you to
> calloc() a structure with an embedded strbuf in it and the "if
> .buf==NULL, call strbuf_init() lazily" can become an established
> pattern), but at the same time it feels a bit brittle.  

Is it brittle because a strbuf may be initialized to non-zero memory, and so
the "if (buf.buf == NULL)" may evaluate to false, and then go on treating
garbage like a valid buffer? I would think that's almost impossible because of
the use of xcalloc.

The only reason I realized the strbuf_init was necessary was not because I read
the documentation, but because I mistakenly called strbuf_reset, which calls
strbuf_setlen, which doesn't handle a null buf. Many other functions seem to
handle it well semi-accidentially. After I ran into the crash, I finally read
the documentation I cited above.

> 
> Such a convention forces everybody who might want to use such an
> embedded strbuf to first check .buf==NULL and lazily initialize
> it---and at some point when the embedded strbuf to be used by enough
> codepaths, it would make the code more robust by giving up on the
> lazy initialization (iow, when *filter_options is initialized, run
> strbuf_init() on its .filter_spec field).