I have also now tested this v2 patch and can confirm that it also fixes the race in fscache that we were reliably able to reproduce with our (re-export) workloads.. Tested-by: Daire Byrne <daire@xxxxxxxx> Daire On Thu, 17 Nov 2022 at 14:30, Dave Wysochanski <dwysocha@xxxxxxxxxx> wrote: > > If a cookie expires from the LRU and the LRU_DISCARD flag is set, > but the state machine has not run yet, it's possible another thread > can call fscache_use_cookie and begin to use it. When the > cookie_worker finally runs, it will see the LRU_DISCARD flag set, > transition the cookie->state to LRU_DISCARDING, which will then > withdraw the cookie. Once the cookie is withdrawn the object is > removed the below oops will occur because the object associated > with the cookie is now NULL. > > Fix the oops by clearing the LRU_DISCARD bit if another thread > uses the cookie before the cookie_worker runs. > > BUG: kernel NULL pointer dereference, address: 0000000000000008 > ... > CPU: 31 PID: 44773 Comm: kworker/u130:1 Tainted: G E 6.0.0-5.dneg.x86_64 #1 > Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022 > Workqueue: events_unbound netfs_rreq_write_to_cache_work [netfs] > RIP: 0010:cachefiles_prepare_write+0x28/0x90 [cachefiles] > ... > Call Trace: > netfs_rreq_write_to_cache_work+0x11c/0x320 [netfs] > process_one_work+0x217/0x3e0 > worker_thread+0x4a/0x3b0 > ? process_one_work+0x3e0/0x3e0 > kthread+0xd6/0x100 > ? kthread_complete_and_exit+0x20/0x20 > ret_from_fork+0x1f/0x30 > > Fixes: 12bb21a29c19 ("fscache: Implement cookie user counting and resource pinning") > Reported-by: Daire Byrne <daire.byrne@xxxxxxxxx> > Signed-off-by: Dave Wysochanski <dwysocha@xxxxxxxxxx> > --- > fs/fscache/cookie.c | 8 ++++++++ > include/trace/events/fscache.h | 2 ++ > 2 files changed, 10 insertions(+) > > diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c > index 451d8a077e12..bce2492186d0 100644 > --- a/fs/fscache/cookie.c > +++ b/fs/fscache/cookie.c > @@ -605,6 +605,14 @@ void __fscache_use_cookie(struct fscache_cookie *cookie, bool will_modify) > set_bit(FSCACHE_COOKIE_DO_PREP_TO_WRITE, &cookie->flags); > queue = true; > } > + /* > + * We could race with cookie_lru which may set LRU_DISCARD bit > + * but has yet to run the cookie state machine. If this happens > + * and another thread tries to use the cookie, clear LRU_DISCARD > + * so we don't end up withdrawing the cookie while in use. > + */ > + if (test_and_clear_bit(FSCACHE_COOKIE_DO_LRU_DISCARD, &cookie->flags)) > + fscache_see_cookie(cookie, fscache_cookie_see_lru_discard_clear); > break; > > case FSCACHE_COOKIE_STATE_FAILED: > diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h > index c078c48a8e6d..a6190aa1b406 100644 > --- a/include/trace/events/fscache.h > +++ b/include/trace/events/fscache.h > @@ -66,6 +66,7 @@ enum fscache_cookie_trace { > fscache_cookie_put_work, > fscache_cookie_see_active, > fscache_cookie_see_lru_discard, > + fscache_cookie_see_lru_discard_clear, > fscache_cookie_see_lru_do_one, > fscache_cookie_see_relinquish, > fscache_cookie_see_withdraw, > @@ -149,6 +150,7 @@ enum fscache_access_trace { > EM(fscache_cookie_put_work, "PQ work ") \ > EM(fscache_cookie_see_active, "- activ") \ > EM(fscache_cookie_see_lru_discard, "- x-lru") \ > + EM(fscache_cookie_see_lru_discard_clear,"- lrudc") \ > EM(fscache_cookie_see_lru_do_one, "- lrudo") \ > EM(fscache_cookie_see_relinquish, "- x-rlq") \ > EM(fscache_cookie_see_withdraw, "- x-wth") \ > -- > 2.31.1 >