On Tue, Jan 28, 2025 at 3:51 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > On Mon, Jan 27, 2025 at 11:21 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > > > On Mon, Jan 27, 2025 at 6:09 AM Will Deacon <will@xxxxxxxxxx> wrote: > > > > > > On Fri, Jan 17, 2025 at 03:41:36PM +0000, Will Deacon wrote: > > > > On Wed, Jan 15, 2025 at 05:00:11PM +0100, Peter Zijlstra wrote: > > > > > On Wed, Jan 15, 2025 at 12:13:34PM +0100, Peter Zijlstra wrote: > > > > > > > > > > > Notably, it means refcount_t is entirely unsuitable for anything > > > > > > SLAB_TYPESAFE_BY_RCU, since they all will need secondary validation > > > > > > conditions after the refcount succeeds. > > > > > > > > > > > > And this is probably fine, but let me ponder this all a little more. > > > > > > > > > > Even though SLAB_TYPESAFE_BY_RCU is relatively rare, I think we'd better > > > > > fix this, these things are hard enough as they are. > > > > > > > > > > Will, others, wdyt? > > > > > > > > We should also update the Documentation (atomic_t.txt and > > > > refcount-vs-atomic.rst) if we strengthen this. > > > > > > > > > --- > > > > > Subject: refcount: Strengthen inc_not_zero() > > > > > > > > > > For speculative lookups where a successful inc_not_zero() pins the > > > > > object, but where we still need to double check if the object acquired > > > > > is indeed the one we set out to aquire, needs this validation to happen > > > > > *after* the increment. > > > > > > > > > > Notably SLAB_TYPESAFE_BY_RCU is one such an example. > > > > > > > > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> > > > > > --- > > > > > include/linux/refcount.h | 15 ++++++++------- > > > > > 1 file changed, 8 insertions(+), 7 deletions(-) > > > > > > > > > > diff --git a/include/linux/refcount.h b/include/linux/refcount.h > > > > > index 35f039ecb272..340e7ffa445e 100644 > > > > > --- a/include/linux/refcount.h > > > > > +++ b/include/linux/refcount.h > > > > > @@ -69,9 +69,10 @@ > > > > > * its the lock acquire, for RCU/lockless data structures its the dependent > > > > > * load. > > > > > * > > > > > - * Do note that inc_not_zero() provides a control dependency which will order > > > > > - * future stores against the inc, this ensures we'll never modify the object > > > > > - * if we did not in fact acquire a reference. > > > > > + * Do note that inc_not_zero() does provide acquire order, which will order > > > > > + * future load and stores against the inc, this ensures all subsequent accesses > > > > > + * are from this object and not anything previously occupying this memory -- > > > > > + * consider SLAB_TYPESAFE_BY_RCU. > > > > > * > > > > > * The decrements will provide release order, such that all the prior loads and > > > > > * stores will be issued before, it also provides a control dependency, which > > > > > @@ -144,7 +145,7 @@ bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp) > > > > > do { > > > > > if (!old) > > > > > break; > > > > > - } while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i)); > > > > > + } while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i)); > > > > > > > > Hmm, do the later memory accesses need to be ordered against the store > > > > part of the increment or just the read? If it's the former, then I don't > > > > think that _acquire is sufficient -- accesses can still get in-between > > > > the read and write parts of the RmW. > > > > > > I dug some more into this at the end of last week. For the > > > SLAB_TYPESAFE_BY_RCU where we're racing inc_not_zero() with > > > dec_and_test(), then I think using _acquire() above is correct as the > > > later references can only move up into the critical section in the case > > > that we successfully obtained a reference. > > > > > > However, if we're going to make the barriers implicit in the refcount > > > operations here then I think we also need to do something on the producer > > > side for when the object is re-initialised after being destroyed and > > > allocated again. I think that would necessitate release ordering for > > > refcount_set() so that whatever allows the consumer to validate the > > > object (e.g. sequence number) is published *before* the refcount. > > > > Thanks Will! > > I would like to expand on your answer to provide an example of the > > race that would happen without release ordering in the producer. To > > save reader's time I provide a simplified flow and reasoning first. > > More detailed code of what I'm considering a typical > > SLAB_TYPESAFE_BY_RCU refcounting example is added at the end of my > > reply (Addendum). > > Simplified flow looks like this: > > > > consumer: > > obj = lookup(collection, key); > > if (!refcount_inc_not_zero(&obj->ref)) > > return; > > smp_rmb(); /* Peter's new acquire fence */ > > if (READ_ONCE(obj->key) != key) { > > put_ref(obj); > > return; > > } > > use(obj->value); > > > > producer: > > old_key = obj->key; > > remove(collection, old_key); > > if (!refcount_dec_and_test(&obj->ref)) > > return; > > obj->key = KEY_INVALID; > > free(objj); > > ... > > obj = malloc(); /* obj is reused */ > > obj->key = new_key; > > obj->value = new_value; > > smp_wmb(); /* Will's proposed release fence */ > > refcount_set(obj->ref, 1); > > insert(collection, key, obj); > > > > Let's consider a case when new_key == old_key. Will call both of them > > "key". Without WIll's proposed fence the following reordering is > > possible: > > > > consumer: > > obj = lookup(collection, key); > > > > producer: > > key = obj->key > > remove(collection, key); > > if (!refcount_dec_and_test(&obj->ref)) > > return; > > obj->key = KEY_INVALID; > > free(objj); > > obj = malloc(); /* obj is reused */ > > refcount_set(obj->ref, 1); > > obj->key = key; /* same key */ > > > > if (!refcount_inc_not_zero(&obj->ref)) > > return; > > smp_rmb(); > > if (READ_ONCE(obj->key) != key) { > > put_ref(obj); > > return; > > } > > use(obj->value); > > > > obj->value = new_value; /* reordered store */ > > add(collection, key, obj); > > > > So, the consumer finds the old object, successfully takes a refcount > > and validates the key. It succeeds because the object is allocated and > > has the same key, which is fine. However it proceeds to use stale > > obj->value. Will's proposed release ordering would prevent that. > > > > The example in https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/slab.h#L102 > > omits all these ordering issues for SLAB_TYPESAFE_BY_RCU. > > I think it would be better to introduce two new functions: > > refcount_add_not_zero_acquire() and refcount_set_release(), clearly > > document that they should be used when a freed object can be recycled > > and reused, like in SLAB_TYPESAFE_BY_RCU case. refcount_set_release() > > should also clarify that once it's called, the object can be accessed > > by consumers even if it was not added yet into the collection used for > > object lookup (like in the example above). SLAB_TYPESAFE_BY_RCU > > comment at https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/slab.h#L102 > > then can explicitly use these new functions in the example code, > > further clarifying their purpose and proper use. > > WDYT? > > Hi Peter, > Should I take a stab at preparing a patch to add the two new > refcounting functions suggested above with updates to the > documentation and comments? > If you disagree with that or need more time to decide then I'll wait. > Please let me know. Not sure if "--in-reply-to" worked but I just posted a patch adding new refcounting APIs for SLAB_TYPESAFE_BY_RCU here: https://lore.kernel.org/all/20250206025201.979573-1-surenb@xxxxxxxxxx/ Since Peter seems to be busy I discussed these ordering requirements for SLAB_TYPESAFE_BY_RCU with Paul McKenney and he was leaning towards having separate functions with the additional fences for this case. That's what I provided in my patch. Another possible option is to add acquire ordering in the __refcount_add_not_zero() as Peter suggested and add refcount_set_release() function. Thanks, Suren. > Thanks, > Suren. > > > > > > ADDENDUM. > > Detailed code for typical use of refcounting with SLAB_TYPESAFE_BY_RCU: > > > > struct object { > > refcount_t ref; > > u64 key; > > u64 value; > > }; > > > > void init(struct object *obj, u64 key, u64 value) > > { > > obj->key = key; > > obj->value = value; > > smp_wmb(); /* Will's proposed release fence */ > > refcount_set(obj->ref, 1); > > } > > > > bool get_ref(struct object *obj, u64 key) > > { > > if (!refcount_inc_not_zero(&obj->ref)) > > return false; > > smp_rmb(); /* Peter's new acquire fence */ > > if (READ_ONCE(obj->key) != key) { > > put_ref(obj); > > return false; > > } > > return true; > > } > > > > void put_ref(struct object *obj) > > { > > if (!refcount_dec_and_test(&obj->ref)) > > return; > > obj->key = KEY_INVALID; > > free(obj); > > } > > > > consumer: > > obj = lookup(collection, key); > > if (!get_ref(obj, key) > > return; > > use(obj->value); > > > > producer: > > remove(collection, old_obj->key); > > put_ref(old_obj); > > new_obj = malloc(); > > init(new_obj, new_key, new_value); > > insert(collection, new_key, new_obj); > > > > With SLAB_TYPESAFE_BY_RCU old_obj in the producer can be reused and be > > equal to new_obj. > > > > > > > > > > Will