On Fri, Sep 16, 2022 at 03:15:05PM +0100, Matthew Wilcox wrote: > On Fri, Sep 16, 2022 at 02:46:39AM -0700, Kees Cook wrote: > > On Fri, Sep 16, 2022 at 09:38:33AM +0100, Matthew Wilcox wrote: > > > On Thu, Sep 15, 2022 at 05:59:56PM -0600, Yu Zhao wrote: > > > > I think this is a manifest of the lockdep warning I reported a couple > > > > of weeks ago: > > > > https://lore.kernel.org/r/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@xxxxxxxxxxxxxx/ > > > > > > That would certainly match the symptoms. > > > > > > Turning vmap_lock into an NMI-safe lock would be bad. I don't even know > > > if we have primitives for that (it's not like you can disable an NMI > > > ...) > > > > > > I don't quite have time to write a patch right now. Perhaps something > > > like: > > > > > > struct vmap_area *find_vmap_area_nmi(unsigned long addr) > > > { > > > struct vmap_area *va; > > > > > > if (spin_trylock(&vmap_area_lock)) > > > return NULL; > > > va = __find_vmap_area(addr, &vmap_area_root); > > > spin_unlock(&vmap_area_lock); > > > > > > return va; > > > } > > > > > > and then call find_vmap_area_nmi() in check_heap_object(). I may have > > > the polarity of the return value of spin_trylock() incorrect. > > > > I think we'll need something slightly tweaked, since this would > > return NULL under any contention (and a NULL return is fatal in > > check_heap_object()). It seems like we need to explicitly check > > for being in nmi context in check_heap_object() to deal with it? > > Like this (only build tested): > > Right, and Ulad is right about it beig callable from any context. I think > the longterm solution is to make the vmap_area_root tree walkable under > RCU protection. > > For now, let's have a distinct return code (ERR_PTR(-EAGAIN), perhaps?) to > indicate that we've hit contention. It generally won't matter if we > hit it in process context because hardening doesn't have to be 100% > reliable to be useful. > > Erm ... so what prevents this race: > > CPU 0 CPU 1 > copy_to_user() > check_heap_object() > area = find_vmap_area(addr) > __purge_vmap_area_lazy() > merge_or_add_vmap_area_augment() > __merge_or_add_vmap_area() > kmem_cache_free(vmap_area_cachep, va); > Sounds like it can happen. I think it is a good argument to switch to the RCU usage here for safe access to va after the lock is released. So i can think about it and put it as task to my todo list. Since it is not urgent so far it is OK to wait for a splat. But it might never happens :) -- Uladzislau Rezki