Re: [PATCH v2] strvec: `strvec_splice()` to a statically initialized vector

Rubén Justo <rjusto@xxxxxxxxx> · Wed, 4 Dec 2024 23:22:01 +0100

On Wed, Dec 04, 2024 at 11:26:27AM +0000, karthik nayak wrote:

> Nit: Is the commit subject missing a verb?

I guess something like "To strvec_splice" sounded good in my head 
:)

> 
> > We use a singleton empty array to initialize a `struct strvec`,
> > similar to the empty string singleton we use to initialize a `struct
> > strbuf`.
> >
> 
> So a singleton empty array is a statically allocated array element, so
> for strvec, this would be `const char *empty_strvec[] = { NULL }`.
> 
> > Note that an empty strvec instance (with zero elements) does not
> > necessarily need to be an instance initialized with the singleton.
> > Let's refer to strvec instances initialized with the singleton as
> > "empty-singleton" instances.
> >
> 
> Right, so when we add elements ideally, we ideally check whether it is a
> singleton or not. This is evident in `strvec_push_nodup()`:
> 
>     void strvec_push_nodup(struct strvec *array, char *value)
>     {
>     	if (array->v == empty_strvec)
>     		array->v = NULL;
> 
>     	ALLOC_GROW(array->v, array->nr + 2, array->alloc);
>     	array->v[array->nr++] = value;
>     	array->v[array->nr] = NULL;
>     }
> 
> >     As a side note, this is the current `strvec_pop()`:
> >
> >     void strvec_pop(struct strvec *array)
> >     {
> >     	if (!array->nr)
> >     		return;
> >     	free((char *)array->v[array->nr - 1]);
> >     	array->v[array->nr - 1] = NULL;
> >     	array->nr--;
> >     }
> >
> >     So, with `strvec_pop()` an instance can become empty but it does
> >     not going to be the an "empty-singleton".
> 
> Correct, since we simply set the array element to NULL, but this is
> still a dynamically allocated array.
> 
> Nit: The sentence reads a bit weirdly.
> 
> > This "empty-singleton" circumstance requires us to be careful when
> > adding elements to instances.  Specifically, when adding the first
> > element:  we detach the strvec instance from the singleton and set the
> > internal pointer in the instance to NULL.  After this point we apply
> > `realloc()` on the pointer.  We do this in `strvec_push_nodup()`, for
> > example.
> >
> > The recently introduced `strvec_splice()` API is expected to be
> > normally used with non-empty strvec's.  However, it can also end up
> > being used with "empty-singleton" strvec's:
> >
> >        struct strvec arr = STRVEC_INIT;
> >        int a = 0, b = 0;
> >
> >        ... no modification to arr, a or b ...
> >
> >        const char *rep[] = { "foo" };
> >        strvec_splice(&arr, a, b, rep, ARRAY_SIZE(rep));
> >
> > So, we'll try to add elements to an "empty-singleton" strvec instance.
> >
> > Avoid misapplying `realloc()` to the singleton in `strvec_splice()` by
> > adding a special case for "empty-singleton" strvec's.
> >
> 
> So everything said here makes sense, that's a great explanation.

Thanks.

> 
> > Signed-off-by: Rubén Justo <rjusto@xxxxxxxxx>
> > ---
> >
> > This iteration adds more detail to the message plus a minor change to
> > remove some unnecessary parentheses.
> >
> > Junio: My message in the previous iteration was aimed at readers like
> > Patrick, who is also the author of `strvec_splice()`.  I certainly
> > assumed too much prior knowledge, which made the review unnecessarily
> > laborious.
> >
> > Rereading what I wrote last night, perhaps the problem now is excess.
> > I hope not. In any case, here it is :-)
> >
> 
> I would say this is very useful over the first iteration, considering I
> am someone without prior knowledge here.

I'm glad to read that.  I guess Junio is to blame ;)  Thanks.

> 
> > Thanks.
> >
> >  strvec.c              | 10 ++++++----
> >  t/unit-tests/strvec.c | 10 ++++++++++
> >  2 files changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/strvec.c b/strvec.c
> > index d1cf4e2496..087c020f5b 100644
> > --- a/strvec.c
> > +++ b/strvec.c
> > @@ -61,16 +61,18 @@ void strvec_splice(struct strvec *array, size_t idx, size_t len,
> >  {
> >  	if (idx + len > array->nr)
> >  		BUG("range outside of array boundary");
> > -	if (replacement_len > len)
> > +	if (replacement_len > len) {
> > +		if (array->v == empty_strvec)
> > +			array->v = NULL;
> >  		ALLOC_GROW(array->v, array->nr + (replacement_len - len) + 1,
> >  			   array->alloc);
> > +	}
> >  	for (size_t i = 0; i < len; i++)
> >  		free((char *)array->v[idx + i]);
> > -	if (replacement_len != len) {
> > +	if ((replacement_len != len) && array->nr)
> >  		memmove(array->v + idx + replacement_len, array->v + idx + len,
> >  			(array->nr - idx - len + 1) * sizeof(char *));
> > -		array->nr += (replacement_len - len);
> > -	}
> > +	array->nr += replacement_len - len;
> 
> Why is this second block of changes needed? Will array-nr ever be 0 when
> we reach here?

I'm not sure I understand your questions.

At that point, `array->nr` is the initial number of entries in the
vector.  It can be 0 when `strvec_splice()` is applied to an empty
vector.

We are moving the line where we update "array->nr" outside the `if`
block because we want to do it even when we are not moving existing
entries.  Again, this happens when `strvec_splice()` is applied to an
empty vector.

Finally, we don't mind too much (or value more the simplicity) of the
now unconditional update of "array->nr" because a clever compiler will
give us the third arm of the if: "else -> do nothing".  When
`replacement_len == len` => "array->nr += 0" => do nothing.

> 
> >  	for (size_t i = 0; i < replacement_len; i++)
> >  		array->v[idx + i] = xstrdup(replacement[i]);
> >  }
> 
> [snip]

Thank you for your review.