Re: [PATCH 07/14] refs/iterator: separate lifecycle from iteration

shejialuo <shejialuo@xxxxxxxxx> · Wed, 19 Feb 2025 20:41:03 +0800

On Wed, Feb 19, 2025 at 12:52:06PM +0100, Patrick Steinhardt wrote:
> > > @@ -935,23 +931,17 @@ static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
> > >  	return ref_iterator_peel(iter->iter0, peeled);
> > >  }
> > >  
> > > -int ref_iterator_abort(struct ref_iterator *ref_iterator)
> > > +void ref_iterator_free(struct ref_iterator *ref_iterator)
> > >  {
> > > -	return ref_iterator->vtable->abort(ref_iterator);
> > > +	if (ref_iterator) {
> > > +		ref_iterator->vtable->release(ref_iterator);
> > > +		/* Help make use-after-free bugs fail quickly: */
> > > +		ref_iterator->vtable = NULL;
> > > +		free(ref_iterator);
> > > +	}
> > >  }
> > >  
> > 
> > So, when calling `ref_iterator_free`, we will call corresponding
> > "release" method to release the resources associated and then we free
> > the iterator. Would this be too complicated? From my view, we could just
> > make the `ref_iterator_abort` name unchanged but add a new variable such
> > as "unsigned int free_iterator". And we change each "abort" callback to
> > avoid free the iterator. This would be much simpler.
> 
> I think adding a separate variable to track whether or not things should
> be freed would make things way more complicated. I would claim the
> opposite: the fact that the patch removes 100 lines of code demonstrates
> quite neatly that the new design is way simpler and needs less logic.
> 

I agree with you with the current implementation, we don't need to worry
about calling "abort" callback in "advance". And the logic is simpler.
When writing this comment, I have thought that there is a situation that
we just want to call "release" callback. So, that's the reason why I
have asked you why not just add a new variable.

However, we don't call "release" callback expect in `ref_iterator_free`.

[snip]

> > 2. I don't think we need to make things complicated. From my
> > understanding, the motivation here is that we don't want to `advance`
> > callback to call `abort` callback. I want to ask an _important_ question
> > here: what is the motivation we rename `abort` to `release` in the first
> > place? As far as I know, we only call this callback in the newly created
> > "ref_iterator_free". Although release may be more accurate, this change
> > truly causes overhead of this patch.
> 
> We have to touch up all of these functions anyway, so renaming them at
> the same point doesn't feel like it adds any more complexity.
> 

I will explain this in the later.

[snip]

> > Writing here, I have always thought that there is a situation that we
> > just want to release the resources but not want to free the iterator
> > itself. That's why I am wondering why just add a new variable to do.
> > However, if we just want to make the lifecycle out, we just delete each
> > "abort" code where it frees the iterator. And we free the iterator in
> > the "ref_iterator_abort". Should this be enough?
> 
> As mentioned, I don't think a new variable would lead to a simplified
> architecture. With re-seekable iterators the caller is the one who needs
> to control whether the iterator should or should not be free'd, as they
> are the only one who knows whether they want to reuse it. So making it
> the responsibility of the callers to release the iterator is the proper
> way to do it.
> 

Yes, actually I think you have misunderstood my meaning here. I have
abandoned the idea to add a new variable when writing here. After
reading through the patch, I know that your motivation. My point is
"However, ..." part.

> I also don't quite see the complexity argument. The patch rather clearly
> shows that we're _reducing_ complexity as it allows us to drop around a
> 100 lines of code. We can stop worrying about whether or not we have to
> call `ref_iterator_abort()` and don't have to worry about any conditions
> that leak from the iterator subsystem to the callers. We just free the
> iterator once we're done with it and call it a day.
> 

Yes, I agree with you that we truly reduce complexity here. And as you
have said, the user allocate the iterator and free the iterator. With
this, we make call sequence clearer.

But there is one thing I want to argue with. I don't think we need to
rename "abort" callback to "release" and also "ref_iterator_abort" to
"ref_iterator_free" for the following reasons:

1. We never call "release" expect in the "ref_iterator_free" function.
For other exposed functions "ref_iterator_advance", "ref_iterator_peel"
and the original "ref_iterator_abort". We will just call the registered
callback "advance", "peel" or "abort" via virtual table. I somehow think
we should follow this pattern. But I don't know actually.
2. When I read the patch yesterday, I really wonder what is the
difference between "release" and "free". Why do we only change the
"ref_iterator_abort" to "ref_iterator_free" but for the callback, we
rename "abort" to "release". I know that you want to distinguish to
emphasis that we won't free the iterator but only release its resource
for ref iterator. But could abort also mean this?

Thanks,
Jialuo