On 6/14/22 17:54, Jann Horn wrote: > On Tue, Jun 14, 2022 at 10:23 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: > >> > stat(s, DEACTIVATE_BYPASS); >> > goto new_slab; >> > @@ -2968,6 +2969,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, >> > freelist = c->freelist; >> > c->slab = NULL; >> > c->freelist = NULL; >> >> Previously these were part of deactivate_slab(), which does that at the very >> end, but also without bumping tid. >> I just wonder if it's necessary too, because IIUC the scenario you described >> relies on the missing bump above. This alone doesn't cause the c->slab vs >> c->freelist mismatch? > > It's a different scenario, but at least in the current version, the > ALLOC_NODE_MISMATCH case jumps straight to the deactivate_slab label, > which takes the local_lock, grabs the old c->freelist, NULLs out > ->slab and ->freelist, then drops the local_lock again. If the > c->freelist was non-NULL, then this will prevent concurrent cmpxchg > success; but there is no reason why c->freelist has to be non-NULL > here. So if c->freelist is already NULL, we basically just take the > local_lock, set c->slab to NULL, and drop the local_lock. And IIUC the Ah, right. Thanks for the explanation. > local_lock is the only protection we have here against concurrency, > since the slub_get_cpu_ptr() in __slab_alloc() only disables > migration? On PREEMPT_RT it disables migration, but on !PREEMPT_RT it's a plain get_cpu_ptr() that does preempt_disable(). But that's an implementation detail, disabling migration would be sufficient on !PREEMPT_RT too, but right now it's cheaper to disable migration. > So again a concurrent fastpath free should be able to set > c->freelist to non-NULL after c->slab has been set to NULL. > > So I think this TID bump is also necessary for correctness in the > current version. OK. > And looking back at older kernels, back to at least 4.9, the > ALLOC_NODE_MISMATCH case looks similarly broken - except that again, > as you pointed out, we don't have the fine-grained locking, so it only > becomes racy if we hit new_slab_objects() -> new_slab() -> > allocate_slab() and then either we do local_irq_enable() or the > allocation fails. > >> Thanks. Applying to slab/for-5.19-rc3/fixes branch. > > Thanks!