Hello mm-maintainers, I have a question about kfree semantics, I can't find answer in docs and opinions I hear differ. Namely, is it OK to pass non-acquired objects to kfree/kmem_cache_free? By non-acquired mean objects unsafely passed between threads without using proper release/acquire (wmb/rmb) memory barriers. The question arose during work on KernelThreadSanitizer, a kernel data race, and in particular caused by the following existing code: // kernel/pid.c if ((atomic_read(&pid->count) == 1) || atomic_dec_and_test(&pid->count)) { kmem_cache_free(ns->pid_cachep, pid); put_pid_ns(ns); } //drivers/tty/tty_buffer.c while ((next = buf->head->next) != NULL) { tty_buffer_free(port, buf->head); buf->head = next; } // Here another thread can concurrently append to the buffer list, and tty_buffer_free eventually calls kfree. Both these cases don't contain proper memory barrier before handing off the object to kfree. In my opinion the code should use smp_load_acquire or READ_ONCE_CTRL ("control-dependnecy-acquire"). Otherwise there can be pending memory accesses to the object in other threads that can interfere with slab code or the next usage of the object after reuse. Paul McKenney suggested that: " The maintainers probably want this sort of code to be allowed: p->a++; if (p->b) { kfree(p); p = NULL; } And the users even more so. So if the compiler really is free to reorder any scribbling/checking by the caller with any scribbling/checking by kfree(), that should be fixed in kfree() rather than in all the callers. " This does not look reasonable to me for 2 reasons: - this incurs unnecessary cost for all kfree users, kfree would have to execute a memory barrier always while most callers already have the object acquired (either single-threaded use, or mutex protected, or the object was properly handed off to the freeing thread) - as far as I understand if an object is unsafely passed between a chain of threads A->B->C->D and then the last does kfree, then kfree can't acquire visibility over the object by executing a memory barrier. All threads in the chain must play by the rules to properly hand off the object to kfree. As far as I understand, that's why atomic_dec_and_test used for reference counting contains full memory barrier; and also kfree does not seem to contain any memory barriers on fast path. Can you please clarify the rules here? Thank you -- Dmitry Vyukov, Software Engineer, dvyukov@xxxxxxxxxx Google Germany GmbH, Dienerstraße 12, 80331, München Geschäftsführer: Graham Law, Christine Elizabeth Flores Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat sind, leiten Sie diese bitte nicht weiter, informieren Sie den Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank. This e-mail is confidential. If you are not the right addressee please do not forward it, please inform the sender, and please erase this e-mail including any attachments. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href