Re: [PATCH] refcount: Strengthen inc_not_zero()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 5, 2025 at 7:03 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
>
> On Tue, Jan 28, 2025 at 3:51 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
> >
> > On Mon, Jan 27, 2025 at 11:21 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
> > >
> > > On Mon, Jan 27, 2025 at 6:09 AM Will Deacon <will@xxxxxxxxxx> wrote:
> > > >
> > > > On Fri, Jan 17, 2025 at 03:41:36PM +0000, Will Deacon wrote:
> > > > > On Wed, Jan 15, 2025 at 05:00:11PM +0100, Peter Zijlstra wrote:
> > > > > > On Wed, Jan 15, 2025 at 12:13:34PM +0100, Peter Zijlstra wrote:
> > > > > >
> > > > > > > Notably, it means refcount_t is entirely unsuitable for anything
> > > > > > > SLAB_TYPESAFE_BY_RCU, since they all will need secondary validation
> > > > > > > conditions after the refcount succeeds.
> > > > > > >
> > > > > > > And this is probably fine, but let me ponder this all a little more.
> > > > > >
> > > > > > Even though SLAB_TYPESAFE_BY_RCU is relatively rare, I think we'd better
> > > > > > fix this, these things are hard enough as they are.
> > > > > >
> > > > > > Will, others, wdyt?
> > > > >
> > > > > We should also update the Documentation (atomic_t.txt and
> > > > > refcount-vs-atomic.rst) if we strengthen this.
> > > > >
> > > > > > ---
> > > > > > Subject: refcount: Strengthen inc_not_zero()
> > > > > >
> > > > > > For speculative lookups where a successful inc_not_zero() pins the
> > > > > > object, but where we still need to double check if the object acquired
> > > > > > is indeed the one we set out to aquire, needs this validation to happen
> > > > > > *after* the increment.
> > > > > >
> > > > > > Notably SLAB_TYPESAFE_BY_RCU is one such an example.
> > > > > >
> > > > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> > > > > > ---
> > > > > >  include/linux/refcount.h | 15 ++++++++-------
> > > > > >  1 file changed, 8 insertions(+), 7 deletions(-)
> > > > > >
> > > > > > diff --git a/include/linux/refcount.h b/include/linux/refcount.h
> > > > > > index 35f039ecb272..340e7ffa445e 100644
> > > > > > --- a/include/linux/refcount.h
> > > > > > +++ b/include/linux/refcount.h
> > > > > > @@ -69,9 +69,10 @@
> > > > > >   * its the lock acquire, for RCU/lockless data structures its the dependent
> > > > > >   * load.
> > > > > >   *
> > > > > > - * Do note that inc_not_zero() provides a control dependency which will order
> > > > > > - * future stores against the inc, this ensures we'll never modify the object
> > > > > > - * if we did not in fact acquire a reference.
> > > > > > + * Do note that inc_not_zero() does provide acquire order, which will order
> > > > > > + * future load and stores against the inc, this ensures all subsequent accesses
> > > > > > + * are from this object and not anything previously occupying this memory --
> > > > > > + * consider SLAB_TYPESAFE_BY_RCU.
> > > > > >   *
> > > > > >   * The decrements will provide release order, such that all the prior loads and
> > > > > >   * stores will be issued before, it also provides a control dependency, which
> > > > > > @@ -144,7 +145,7 @@ bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp)
> > > > > >     do {
> > > > > >             if (!old)
> > > > > >                     break;
> > > > > > -   } while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i));
> > > > > > +   } while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i));
> > > > >
> > > > > Hmm, do the later memory accesses need to be ordered against the store
> > > > > part of the increment or just the read? If it's the former, then I don't
> > > > > think that _acquire is sufficient -- accesses can still get in-between
> > > > > the read and write parts of the RmW.
> > > >
> > > > I dug some more into this at the end of last week. For the
> > > > SLAB_TYPESAFE_BY_RCU where we're racing inc_not_zero() with
> > > > dec_and_test(), then I think using _acquire() above is correct as the
> > > > later references can only move up into the critical section in the case
> > > > that we successfully obtained a reference.
> > > >
> > > > However, if we're going to make the barriers implicit in the refcount
> > > > operations here then I think we also need to do something on the producer
> > > > side for when the object is re-initialised after being destroyed and
> > > > allocated again. I think that would necessitate release ordering for
> > > > refcount_set() so that whatever allows the consumer to validate the
> > > > object (e.g. sequence number) is published *before* the refcount.
> > >
> > > Thanks Will!
> > > I would like to expand on your answer to provide an example of the
> > > race that would happen without release ordering in the producer. To
> > > save reader's time I provide a simplified flow and reasoning first.
> > > More detailed code of what I'm considering a typical
> > > SLAB_TYPESAFE_BY_RCU refcounting example is added at the end of my
> > > reply (Addendum).
> > > Simplified flow looks like this:
> > >
> > > consumer:
> > >     obj = lookup(collection, key);
> > >     if (!refcount_inc_not_zero(&obj->ref))
> > >         return;
> > >     smp_rmb(); /* Peter's new acquire fence */
> > >     if (READ_ONCE(obj->key) != key) {
> > >         put_ref(obj);
> > >         return;
> > >     }
> > >     use(obj->value);
> > >
> > > producer:
> > >     old_key = obj->key;
> > >     remove(collection, old_key);
> > >     if (!refcount_dec_and_test(&obj->ref))
> > >         return;
> > >     obj->key = KEY_INVALID;
> > >     free(objj);
> > >     ...
> > >     obj = malloc(); /* obj is reused */
> > >     obj->key = new_key;
> > >     obj->value = new_value;
> > >     smp_wmb(); /* Will's proposed release fence */
> > >     refcount_set(obj->ref, 1);
> > >     insert(collection, key, obj);
> > >
> > > Let's consider a case when new_key == old_key. Will call both of them
> > > "key". Without WIll's proposed fence the following reordering is
> > > possible:
> > >
> > > consumer:
> > >     obj = lookup(collection, key);
> > >
> > >                  producer:
> > >                      key = obj->key
> > >                      remove(collection, key);
> > >                      if (!refcount_dec_and_test(&obj->ref))
> > >                          return;
> > >                      obj->key = KEY_INVALID;
> > >                      free(objj);
> > >                      obj = malloc(); /* obj is reused */
> > >                      refcount_set(obj->ref, 1);
> > >                      obj->key = key; /* same key */
> > >
> > >     if (!refcount_inc_not_zero(&obj->ref))
> > >         return;
> > >     smp_rmb();
> > >     if (READ_ONCE(obj->key) != key) {
> > >         put_ref(obj);
> > >         return;
> > >     }
> > >     use(obj->value);
> > >
> > >                      obj->value = new_value; /* reordered store */
> > >                      add(collection, key, obj);
> > >
> > > So, the consumer finds the old object, successfully takes a refcount
> > > and validates the key. It succeeds because the object is allocated and
> > > has the same key, which is fine. However it proceeds to use stale
> > > obj->value. Will's proposed release ordering would prevent that.
> > >
> > > The example in https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/slab.h#L102
> > > omits all these ordering issues for SLAB_TYPESAFE_BY_RCU.
> > > I think it would be better to introduce two new functions:
> > > refcount_add_not_zero_acquire() and refcount_set_release(), clearly
> > > document that they should be used when a freed object can be recycled
> > > and reused, like in SLAB_TYPESAFE_BY_RCU case. refcount_set_release()
> > > should also clarify that once it's called, the object can be accessed
> > > by consumers even if it was not added yet into the collection used for
> > > object lookup (like in the example above). SLAB_TYPESAFE_BY_RCU
> > > comment at https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/slab.h#L102
> > > then can explicitly use these new functions in the example code,
> > > further clarifying their purpose and proper use.
> > > WDYT?
> >
> > Hi Peter,
> > Should I take a stab at preparing a patch to add the two new
> > refcounting functions suggested above with updates to the
> > documentation and comments?
> > If you disagree with that or need more time to decide then I'll wait.
> > Please let me know.
>
> Not sure if "--in-reply-to" worked but I just posted a patch adding
> new refcounting APIs for SLAB_TYPESAFE_BY_RCU here:
> https://lore.kernel.org/all/20250206025201.979573-1-surenb@xxxxxxxxxx/

Since I did not get any replies other than Vlastimil's Ack on the
above patch, I went ahead and posted v10 of my patchset [1] and
included the patch above in it [2]. Feedback is highly appreciated!

[1] https://lore.kernel.org/all/20250213224655.1680278-1-surenb@xxxxxxxxxx/
[2] https://lore.kernel.org/all/20250213224655.1680278-11-surenb@xxxxxxxxxx/


> Since Peter seems to be busy I discussed these ordering requirements
> for SLAB_TYPESAFE_BY_RCU with Paul McKenney and he was leaning towards
> having separate functions with the additional fences for this case.
> That's what I provided in my patch.
> Another possible option is to add acquire ordering in the
> __refcount_add_not_zero() as Peter suggested and add
> refcount_set_release() function.
> Thanks,
> Suren.
>
>
> > Thanks,
> > Suren.
> >
> >
> > >
> > > ADDENDUM.
> > > Detailed code for typical use of refcounting with SLAB_TYPESAFE_BY_RCU:
> > >
> > > struct object {
> > >     refcount_t ref;
> > >     u64 key;
> > >     u64 value;
> > > };
> > >
> > > void init(struct object *obj, u64 key, u64 value)
> > > {
> > >     obj->key = key;
> > >     obj->value = value;
> > >     smp_wmb(); /* Will's proposed release fence */
> > >     refcount_set(obj->ref, 1);
> > > }
> > >
> > > bool get_ref(struct object *obj, u64 key)
> > > {
> > >     if (!refcount_inc_not_zero(&obj->ref))
> > >         return false;
> > >     smp_rmb(); /* Peter's new acquire fence */
> > >     if (READ_ONCE(obj->key) != key) {
> > >         put_ref(obj);
> > >         return false;
> > >     }
> > >     return true;
> > > }
> > >
> > > void put_ref(struct object *obj)
> > > {
> > >     if (!refcount_dec_and_test(&obj->ref))
> > >         return;
> > >     obj->key = KEY_INVALID;
> > >     free(obj);
> > > }
> > >
> > > consumer:
> > >     obj = lookup(collection, key);
> > >     if (!get_ref(obj, key)
> > >         return;
> > >     use(obj->value);
> > >
> > > producer:
> > >     remove(collection, old_obj->key);
> > >     put_ref(old_obj);
> > >     new_obj = malloc();
> > >     init(new_obj, new_key, new_value);
> > >     insert(collection, new_key, new_obj);
> > >
> > > With SLAB_TYPESAFE_BY_RCU old_obj in the producer can be reused and be
> > > equal to new_obj.
> > >
> > >
> > > >
> > > > Will

On Wed, Feb 5, 2025 at 7:03 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
>
> On Tue, Jan 28, 2025 at 3:51 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
> >
> > On Mon, Jan 27, 2025 at 11:21 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
> > >
> > > On Mon, Jan 27, 2025 at 6:09 AM Will Deacon <will@xxxxxxxxxx> wrote:
> > > >
> > > > On Fri, Jan 17, 2025 at 03:41:36PM +0000, Will Deacon wrote:
> > > > > On Wed, Jan 15, 2025 at 05:00:11PM +0100, Peter Zijlstra wrote:
> > > > > > On Wed, Jan 15, 2025 at 12:13:34PM +0100, Peter Zijlstra wrote:
> > > > > >
> > > > > > > Notably, it means refcount_t is entirely unsuitable for anything
> > > > > > > SLAB_TYPESAFE_BY_RCU, since they all will need secondary validation
> > > > > > > conditions after the refcount succeeds.
> > > > > > >
> > > > > > > And this is probably fine, but let me ponder this all a little more.
> > > > > >
> > > > > > Even though SLAB_TYPESAFE_BY_RCU is relatively rare, I think we'd better
> > > > > > fix this, these things are hard enough as they are.
> > > > > >
> > > > > > Will, others, wdyt?
> > > > >
> > > > > We should also update the Documentation (atomic_t.txt and
> > > > > refcount-vs-atomic.rst) if we strengthen this.
> > > > >
> > > > > > ---
> > > > > > Subject: refcount: Strengthen inc_not_zero()
> > > > > >
> > > > > > For speculative lookups where a successful inc_not_zero() pins the
> > > > > > object, but where we still need to double check if the object acquired
> > > > > > is indeed the one we set out to aquire, needs this validation to happen
> > > > > > *after* the increment.
> > > > > >
> > > > > > Notably SLAB_TYPESAFE_BY_RCU is one such an example.
> > > > > >
> > > > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> > > > > > ---
> > > > > >  include/linux/refcount.h | 15 ++++++++-------
> > > > > >  1 file changed, 8 insertions(+), 7 deletions(-)
> > > > > >
> > > > > > diff --git a/include/linux/refcount.h b/include/linux/refcount.h
> > > > > > index 35f039ecb272..340e7ffa445e 100644
> > > > > > --- a/include/linux/refcount.h
> > > > > > +++ b/include/linux/refcount.h
> > > > > > @@ -69,9 +69,10 @@
> > > > > >   * its the lock acquire, for RCU/lockless data structures its the dependent
> > > > > >   * load.
> > > > > >   *
> > > > > > - * Do note that inc_not_zero() provides a control dependency which will order
> > > > > > - * future stores against the inc, this ensures we'll never modify the object
> > > > > > - * if we did not in fact acquire a reference.
> > > > > > + * Do note that inc_not_zero() does provide acquire order, which will order
> > > > > > + * future load and stores against the inc, this ensures all subsequent accesses
> > > > > > + * are from this object and not anything previously occupying this memory --
> > > > > > + * consider SLAB_TYPESAFE_BY_RCU.
> > > > > >   *
> > > > > >   * The decrements will provide release order, such that all the prior loads and
> > > > > >   * stores will be issued before, it also provides a control dependency, which
> > > > > > @@ -144,7 +145,7 @@ bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp)
> > > > > >     do {
> > > > > >             if (!old)
> > > > > >                     break;
> > > > > > -   } while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i));
> > > > > > +   } while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i));
> > > > >
> > > > > Hmm, do the later memory accesses need to be ordered against the store
> > > > > part of the increment or just the read? If it's the former, then I don't
> > > > > think that _acquire is sufficient -- accesses can still get in-between
> > > > > the read and write parts of the RmW.
> > > >
> > > > I dug some more into this at the end of last week. For the
> > > > SLAB_TYPESAFE_BY_RCU where we're racing inc_not_zero() with
> > > > dec_and_test(), then I think using _acquire() above is correct as the
> > > > later references can only move up into the critical section in the case
> > > > that we successfully obtained a reference.
> > > >
> > > > However, if we're going to make the barriers implicit in the refcount
> > > > operations here then I think we also need to do something on the producer
> > > > side for when the object is re-initialised after being destroyed and
> > > > allocated again. I think that would necessitate release ordering for
> > > > refcount_set() so that whatever allows the consumer to validate the
> > > > object (e.g. sequence number) is published *before* the refcount.
> > >
> > > Thanks Will!
> > > I would like to expand on your answer to provide an example of the
> > > race that would happen without release ordering in the producer. To
> > > save reader's time I provide a simplified flow and reasoning first.
> > > More detailed code of what I'm considering a typical
> > > SLAB_TYPESAFE_BY_RCU refcounting example is added at the end of my
> > > reply (Addendum).
> > > Simplified flow looks like this:
> > >
> > > consumer:
> > >     obj = lookup(collection, key);
> > >     if (!refcount_inc_not_zero(&obj->ref))
> > >         return;
> > >     smp_rmb(); /* Peter's new acquire fence */
> > >     if (READ_ONCE(obj->key) != key) {
> > >         put_ref(obj);
> > >         return;
> > >     }
> > >     use(obj->value);
> > >
> > > producer:
> > >     old_key = obj->key;
> > >     remove(collection, old_key);
> > >     if (!refcount_dec_and_test(&obj->ref))
> > >         return;
> > >     obj->key = KEY_INVALID;
> > >     free(objj);
> > >     ...
> > >     obj = malloc(); /* obj is reused */
> > >     obj->key = new_key;
> > >     obj->value = new_value;
> > >     smp_wmb(); /* Will's proposed release fence */
> > >     refcount_set(obj->ref, 1);
> > >     insert(collection, key, obj);
> > >
> > > Let's consider a case when new_key == old_key. Will call both of them
> > > "key". Without WIll's proposed fence the following reordering is
> > > possible:
> > >
> > > consumer:
> > >     obj = lookup(collection, key);
> > >
> > >                  producer:
> > >                      key = obj->key
> > >                      remove(collection, key);
> > >                      if (!refcount_dec_and_test(&obj->ref))
> > >                          return;
> > >                      obj->key = KEY_INVALID;
> > >                      free(objj);
> > >                      obj = malloc(); /* obj is reused */
> > >                      refcount_set(obj->ref, 1);
> > >                      obj->key = key; /* same key */
> > >
> > >     if (!refcount_inc_not_zero(&obj->ref))
> > >         return;
> > >     smp_rmb();
> > >     if (READ_ONCE(obj->key) != key) {
> > >         put_ref(obj);
> > >         return;
> > >     }
> > >     use(obj->value);
> > >
> > >                      obj->value = new_value; /* reordered store */
> > >                      add(collection, key, obj);
> > >
> > > So, the consumer finds the old object, successfully takes a refcount
> > > and validates the key. It succeeds because the object is allocated and
> > > has the same key, which is fine. However it proceeds to use stale
> > > obj->value. Will's proposed release ordering would prevent that.
> > >
> > > The example in https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/slab.h#L102
> > > omits all these ordering issues for SLAB_TYPESAFE_BY_RCU.
> > > I think it would be better to introduce two new functions:
> > > refcount_add_not_zero_acquire() and refcount_set_release(), clearly
> > > document that they should be used when a freed object can be recycled
> > > and reused, like in SLAB_TYPESAFE_BY_RCU case. refcount_set_release()
> > > should also clarify that once it's called, the object can be accessed
> > > by consumers even if it was not added yet into the collection used for
> > > object lookup (like in the example above). SLAB_TYPESAFE_BY_RCU
> > > comment at https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/slab.h#L102
> > > then can explicitly use these new functions in the example code,
> > > further clarifying their purpose and proper use.
> > > WDYT?
> >
> > Hi Peter,
> > Should I take a stab at preparing a patch to add the two new
> > refcounting functions suggested above with updates to the
> > documentation and comments?
> > If you disagree with that or need more time to decide then I'll wait.
> > Please let me know.
>
> Not sure if "--in-reply-to" worked but I just posted a patch adding
> new refcounting APIs for SLAB_TYPESAFE_BY_RCU here:
> https://lore.kernel.org/all/20250206025201.979573-1-surenb@xxxxxxxxxx/
> Since Peter seems to be busy I discussed these ordering requirements
> for SLAB_TYPESAFE_BY_RCU with Paul McKenney and he was leaning towards
> having separate functions with the additional fences for this case.
> That's what I provided in my patch.
> Another possible option is to add acquire ordering in the
> __refcount_add_not_zero() as Peter suggested and add
> refcount_set_release() function.
> Thanks,
> Suren.
>
>
> > Thanks,
> > Suren.
> >
> >
> > >
> > > ADDENDUM.
> > > Detailed code for typical use of refcounting with SLAB_TYPESAFE_BY_RCU:
> > >
> > > struct object {
> > >     refcount_t ref;
> > >     u64 key;
> > >     u64 value;
> > > };
> > >
> > > void init(struct object *obj, u64 key, u64 value)
> > > {
> > >     obj->key = key;
> > >     obj->value = value;
> > >     smp_wmb(); /* Will's proposed release fence */
> > >     refcount_set(obj->ref, 1);
> > > }
> > >
> > > bool get_ref(struct object *obj, u64 key)
> > > {
> > >     if (!refcount_inc_not_zero(&obj->ref))
> > >         return false;
> > >     smp_rmb(); /* Peter's new acquire fence */
> > >     if (READ_ONCE(obj->key) != key) {
> > >         put_ref(obj);
> > >         return false;
> > >     }
> > >     return true;
> > > }
> > >
> > > void put_ref(struct object *obj)
> > > {
> > >     if (!refcount_dec_and_test(&obj->ref))
> > >         return;
> > >     obj->key = KEY_INVALID;
> > >     free(obj);
> > > }
> > >
> > > consumer:
> > >     obj = lookup(collection, key);
> > >     if (!get_ref(obj, key)
> > >         return;
> > >     use(obj->value);
> > >
> > > producer:
> > >     remove(collection, old_obj->key);
> > >     put_ref(old_obj);
> > >     new_obj = malloc();
> > >     init(new_obj, new_key, new_value);
> > >     insert(collection, new_key, new_obj);
> > >
> > > With SLAB_TYPESAFE_BY_RCU old_obj in the producer can be reused and be
> > > equal to new_obj.
> > >
> > >
> > > >
> > > > Will





[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux