On Tue, Jun 14, 2022 at 10:23 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: > On 6/8/22 20:22, Jann Horn wrote: > > The fastpath in slab_alloc_node() assumes that c->slab is stable as long as > > the TID stays the same. However, two places in __slab_alloc() currently > > don't update the TID when deactivating the CPU slab. > > > > If multiple operations race the right way, this could lead to an object > > getting lost; or, in an even more unlikely situation, it could even lead to > > an object being freed onto the wrong slab's freelist, messing up the > > `inuse` counter and eventually causing a page to be freed to the page > > allocator while it still contains slab objects. [...] > > Fixes: c17dda40a6a4e ("slub: Separate out kmem_cache_cpu processing from deactivate_slab") > > Fixes: 03e404af26dc2 ("slub: fast release on full slab") > > Cc: stable@xxxxxxxxxxxxxxx > > Hmm these are old commits, and currently oldest LTS is 4.9, so this will be > fun. Worth doublechecking if it's not recent changes that actually > introduced the bug... but seems not, AFAICS. [...] > > @@ -2936,6 +2936,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, > > > > if (!freelist) { > > c->slab = NULL; > > + c->tid = next_tid(c->tid); > > local_unlock_irqrestore(&s->cpu_slab->lock, flags); > > So this immediate unlock after setting NULL is new from the 5.15 preempt-rt > changes. However even in older versions we could goto new_slab, > new_slab_objects(), new_slab(), allocate_slab(), where if > (gfpflags_allow_blocking()) local_irq_enable(); (there's no extra disabled > preemption besides the irq disable) so I'd say the bug was possible before > too, but less often? Yeah, I think so too. > > stat(s, DEACTIVATE_BYPASS); > > goto new_slab; > > @@ -2968,6 +2969,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, > > freelist = c->freelist; > > c->slab = NULL; > > c->freelist = NULL; > > Previously these were part of deactivate_slab(), which does that at the very > end, but also without bumping tid. > I just wonder if it's necessary too, because IIUC the scenario you described > relies on the missing bump above. This alone doesn't cause the c->slab vs > c->freelist mismatch? It's a different scenario, but at least in the current version, the ALLOC_NODE_MISMATCH case jumps straight to the deactivate_slab label, which takes the local_lock, grabs the old c->freelist, NULLs out ->slab and ->freelist, then drops the local_lock again. If the c->freelist was non-NULL, then this will prevent concurrent cmpxchg success; but there is no reason why c->freelist has to be non-NULL here. So if c->freelist is already NULL, we basically just take the local_lock, set c->slab to NULL, and drop the local_lock. And IIUC the local_lock is the only protection we have here against concurrency, since the slub_get_cpu_ptr() in __slab_alloc() only disables migration? So again a concurrent fastpath free should be able to set c->freelist to non-NULL after c->slab has been set to NULL. So I think this TID bump is also necessary for correctness in the current version. And looking back at older kernels, back to at least 4.9, the ALLOC_NODE_MISMATCH case looks similarly broken - except that again, as you pointed out, we don't have the fine-grained locking, so it only becomes racy if we hit new_slab_objects() -> new_slab() -> allocate_slab() and then either we do local_irq_enable() or the allocation fails. > Thanks. Applying to slab/for-5.19-rc3/fixes branch. Thanks!