On Mon, 28 Aug 2023 at 16:40, Jann Horn <jannh@xxxxxxxxxx> wrote: > > On Sat, Aug 26, 2023 at 5:32 AM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote: > > On Fri, 25 Aug 2023 at 23:15, Jann Horn <jannh@xxxxxxxxxx> wrote: > > > Currently, KASAN is unable to catch use-after-free in SLAB_TYPESAFE_BY_RCU > > > slabs because use-after-free is allowed within the RCU grace period by > > > design. > > > > > > Add a SLUB debugging feature which RCU-delays every individual > > > kmem_cache_free() before either actually freeing the object or handing it > > > off to KASAN, and change KASAN to poison freed objects as normal when this > > > option is enabled. > > > > > > Note that this creates a 16-byte unpoisoned area in the middle of the > > > slab metadata area, which kinda sucks but seems to be necessary in order > > > to be able to store an rcu_head in there without triggering an ASAN > > > splat during RCU callback processing. > > > > Nice! > > > > Can't we unpoision this rcu_head right before call_rcu() and repoison > > after receiving the callback? > > Yeah, I think that should work. It looks like currently > kasan_unpoison() is exposed in include/linux/kasan.h but > kasan_poison() is not, and its inline definition probably means I > can't just move it out of mm/kasan/kasan.h into include/linux/kasan.h; > do you have a preference for how I should handle this? Hmm, and it > also looks like code outside of mm/kasan/ anyway wouldn't know what > are valid values for the "value" argument to kasan_poison(). > I also have another feature idea that would also benefit from having > something like kasan_poison() available in include/linux/kasan.h, so I > would prefer that over adding another special-case function inside > KASAN for poisoning this piece of slab metadata... > > I guess I could define a wrapper around kasan_poison() in > mm/kasan/generic.c that uses a new poison value for "some other part > of the kernel told us to poison this area", and then expose that > wrapper with a declaration in include/mm/kasan.h? Something like: > > void kasan_poison_outline(const void *addr, size_t size, bool init) > { > kasan_poison(addr, size, KASAN_CUSTOM, init); > } Looks reasonable. > > What happens on cache destruction? > > Currently we purge quarantine on cache destruction to be able to > > safely destroy the cache. I suspect we may need to somehow purge rcu > > callbacks as well, or do something else. > > Ooh, good point, I hadn't thought about that... currently > shutdown_cache() assumes that all the objects have already been freed, > then puts the kmem_cache on a list for > slab_caches_to_rcu_destroy_workfn(), which then waits with an > rcu_barrier() until the slab's pages are all gone. I guess this is what the test robot found as well. > Luckily kmem_cache_destroy() is already a sleepable operation, so > maybe I should just slap another rcu_barrier() in there for builds > with this config option enabled... I think that should be fine for an > option mostly intended for debugging. This is definitely the simplest option. I am a bit concerned about performance if massive cache destruction happens (e.g. maybe during destruction of a set of namespaces for a container). Net namespace is slow to destroy for this reason IIRC, there were some optimizations to batch rcu synchronization. And now we are adding more. But I don't see any reasonable faster option as well. So I guess let's do this now and optimize later (or not).