On 06/02, Linus Torvalds wrote: > > On Fri, Jun 2, 2023 at 1:59 PM Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > > > > As I said from the very beginning, this code is fine on x86 because > > atomic ops are fully serialised on x86. > > Yes. Other architectures require __smp_mb__{before,after}_atomic for > the bit setting ops to actually be memory barriers. > > We *should* probably have acquire/release versions of the bit test/set > helpers, but we don't, so they end up being full memory barriers with > those things. Which isn't optimal, but I doubt it matters on most > architectures. > > So maybe we'll some day have a "test_bit_acquire()" and a > "set_bit_release()" etc. In this particular case we need clear_bit_release() and iiuc it is already here, just it is named clear_bit_unlock(). So do you agree that vhost_worker() needs smp_mb__before_atomic() before clear_bit() or just clear_bit_unlock() to avoid the race with vhost_work_queue() ? Let me provide a simplified example: struct item { struct llist_node llist; unsigned long flags; }; struct llist_head HEAD = {}; // global void queue(struct item *item) { // ensure this item was already flushed if (!test_and_set_bit(0, &item->flags)) llist_add(item->llist, &HEAD); } void flush(void) { struct llist_node *head = llist_del_all(&HEAD); struct item *item, *next; llist_for_each_entry_safe(item, next, head, llist) clear_bit(0, &item->flags); } I think this code is buggy in that flush() can race with queue(), the same way as vhost_worker() and vhost_work_queue(). Once flush() clears bit 0, queue() can come on another CPU and re-queue this item and change item->llist.next. We need a barrier before clear_bit() to ensure that next = llist_entry(item->next) in llist_for_each_entry_safe() completes before the result of clear_bit() is visible to queue(). And, I do not think we can rely on control dependency because... because I fail to see the load-store control dependency in this code, llist_for_each_entry_safe() loads item->llist.next but doesn't check the result until the next iteration. No? Oleg. _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization