On Fri, 22 Apr 2022 at 00:07, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Thu, 14 Apr 2022 10:59:25 +0800 Zqiang <qiang1.zhang@xxxxxxxxx> wrote: > > > The kasan_quarantine_remove_cache() is called in kmem_cache_shrink()/ > > destroy(), the kasan_quarantine_remove_cache() call is protected by > > cpuslock in kmem_cache_destroy(), can ensure serialization with > > kasan_cpu_offline(). however the kasan_quarantine_remove_cache() call > > is not protected by cpuslock in kmem_cache_shrink(), when CPU going > > offline and cache shrink occur at same time, the cpu_quarantine may be > > corrupted by interrupt(per_cpu_remove_cache operation). so add > > cpu_quarantine offline flags check in per_cpu_remove_cache(). > > > > ... > > > > Could we please have some reviewer input here? This is very tricky, I think can follow this: Reviewed-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx> If q->offline is set, then kasan_cpu_offline() will or has already removed everything from cpu_quarantine and freed, so we can return early in per_cpu_remove_cache(). If kasan_cpu_offline() hasn't yet removed everything from cpu_quarantine already, it's actually problematic for the kmem_cache_destroy() case. But since both kmem_cache_destroy() and kasan_cpu_offline() are serialized by cpus lock, this case must not happen. > > --- a/mm/kasan/quarantine.c > > +++ b/mm/kasan/quarantine.c > > @@ -330,6 +330,8 @@ static void per_cpu_remove_cache(void *arg) > > struct cpu_shrink_qlist *sq; > > #endif > > q = this_cpu_ptr(&cpu_quarantine); > > + if (READ_ONCE(q->offline)) > > + return; > > #ifndef CONFIG_PREEMPT_RT > > qlist_move_cache(q, &to_free, cache); > > qlist_free_all(&to_free, cache); > > It might be helpful to have a little comment which explains why we're > doing this?