Re: [PATCH v2 2/3] fs: add file_ref

Christian Brauner <brauner@xxxxxxxxxx> · Tue, 8 Oct 2024 12:12:20 +0200

On Mon, Oct 07, 2024 at 11:07:51AM GMT, Linus Torvalds wrote:
> On Mon, 7 Oct 2024 at 07:24, Christian Brauner <brauner@xxxxxxxxxx> wrote:
> >
> > +static __always_inline __must_check bool file_ref_get(file_ref_t *ref)
> > +{
> > +       /*
> > +        * Unconditionally increase the reference count with full
> > +        * ordering. The saturation and dead zones provide enough
> > +        * tolerance for this.
> > +        */
> > +       if (likely(!atomic_long_add_negative(1, &ref->refcnt)))
> > +               return true;
> > +
> > +       /* Handle the cases inside the saturation and dead zones */
> > +       return __file_ref_get(ref);
> > +}
> 
> Ack. This looks like it does the right thing.
> 
> That said, I wonder if we could clarify this code sequence by using
> 
>         long old = atomic_long_fetch_inc(&ref->refcnt);
>         if (old > 0)
>                 return true;
>         return __file_ref_get(ref, old);
> 
> instead, which would obviate all the games with using the magic
> offset? IOW, it could use 0 as "free" instead of the special
> off-by-one "-1 is free".
> 
> The reason we have that special "add_negative()" thing is that this
> *used* to be much better on x86, because "xadd" was only added in the
> i486, and used to be a few cycles slower than "lock ; add".
> 
> We got rid of i386 support some time ago, and the lack of xadd was
> _one_ of the reasons for it (the supervisor mode WP bit handling issue
> was the big one).
> 
> And yes, "add_negative()" still generates simpler code, in that it
> just returns the sign flag, but I do feel it makes that code harder to
> follow.  And I say that despite being very aware of the whole

Thanks for the background. That was helpful!

Switching atomic_long_fetch_inc() would change the logic quite a bit
though as atomic_long_fetch_inc() returns the previous value. In the
current scheme atomic_long_add_negative() will signal that an overflow
happened and then we drop into the slowpath reading the updated value
and resetting to FILE_REF_DEAD if we overflowed.

So if T1 overflows but accidently happens to be the last thread that
does a file_ref_get() then T1 will reset the counter to FILE_REF_DEAD.

But iiuc, if we now rely on the previous value T1 wouldn't reset if it
overflowed and happened to currently be the last thread that does some
operation that takes a lot of time. It also would mean that T1 overflows
but does still return success.

We could handle such cases by either using atomic_long_inc_return() or
by manually checking the previous value against FILE_REF_MAXREF and
passing a +1 or -1 value to the slowpath helpers.

But imho I find the current code easier to follow. Maybe I'm being held
hostage by having read the rcuref code a bit too often by now. But if
you don't feel strongly I'd leave it as is.