Re: [PATCH v3 5/6] histograms: Add traceeval insert

Ross Zwisler <zwisler@xxxxxxxxxx> · Tue, 8 Aug 2023 17:51:14 -0600

On Tue, Aug 08, 2023 at 07:32:49PM -0400, Stevie Alvarez wrote:
> On Tue, Aug 08, 2023 at 01:59:27PM -0600, Ross Zwisler wrote:
> > On Tue, Aug 08, 2023 at 12:11:58PM -0400, Stevie Alvarez wrote:
> > > From: Stevie Alvarez (Google) <stevie.6strings@xxxxxxxxx>
> > > 
> > > traceeval_insert() updates the provided struct traceeval's histogram.
> > > If an entry that exists with a keys field that match the keys argument,
> > > the entries vals field are updated with a copy of the vals argument.
> > > If such an entry does not exist, a new entry is added to the histogram,
> > > with respect to the keys and vals arguments.
> > > 
> > > Signed-off-by: Stevie Alvarez (Google) <stevie.6strings@xxxxxxxxx>
> > > ---
> > >  include/traceeval-hist.h |   4 ++
> > >  src/histograms.c         | 106 +++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 110 insertions(+)
> > > 
> > > diff --git a/include/traceeval-hist.h b/include/traceeval-hist.h
> > > index 4923de1..e713c70 100644
> > > --- a/include/traceeval-hist.h
> > > +++ b/include/traceeval-hist.h
<>
> > > +/*
> > > + * Update @entry's vals field with a copy of @vals, with respect to @teval.
> > > + *
> > > + * Return 0 on success, -1 on error.
> > > + */
> > > +static int update_entry(struct traceeval *teval, struct entry *entry,
> > > +			const union traceeval_data *vals)
> > > +{
> > > +	union traceeval_data *new_vals;
> > > +
> > > +	if (copy_traceeval_data_set(teval->nr_val_types, teval->val_types,
> > > +				vals, &new_vals) == -1)
> > > +		return -1;
> > > +
> > > +	entry->vals = new_vals;
> > 
> > Are we leaking the old 'entry->vals', as we never free that memory anywhere? 
> 
> Thanks for catching this!
> > 
> > Should we just update in-place, instead of allocating a new one & freeing the
> > old?  The vals arrays should each be the same size because they have a
> > common 'teval->val_types'. 
> > 
> > I know 'vals' is owned by the caller, but we allocated and own 'entry->vals'
> > via a previous call to create_entry().
> > 
> > We'll have to free existing strings & allocate new ones (and do something
> > similar for dynamic types when they're supported), but the rest of
> > 'entry->vals' should be reusable I think.
> 
> My worry about doing this in place is we can't guarantee the operation
> to be atomic if an error occurs when trying to copy a string. Take the
> following traceeval instance for example...
> 
> 	teval->val_types = {
> 		{ .type = TRACEEVAL_TYPE_STRING },
> 		{ .type = TRACEEVAL_TYPE_STRING },
> 		{ .type = TRACEEVAL_TYPE_NONE }
> 	}
> 
> Say an entry is being updated, and we've freed the first string and
> placed a copy of the new string in its place. We then move on to the
> second string; we attempt to make a copy of the new string, but it
> fails, leaving the state changed.
> Of course, we can keep track of all the original strings and free them
> at the very end. Is that more efficient than copying the new vals,
> freeing entry->vals via clean_data(), and then assigning the new vals
> to entry->vals? My immediate response is that the in place approach is
> just as good as the later, but I could be thinking about this the wrong
> way. Is my logic here sound? Thoughts?

Sure, that concern makes sense to me.  Keeping track of all the original vals
seems complex, and I agree that just making a new 'vals' from scratch and then
freeing the old is fine.  This will let us re-use the free path we already
have in 'clean_data()' which is preferable to having 2 separate cleanup paths.