Hi Peter, Paul, Jeff Layton is seeing a weird failed wake in the fscache rewrite. Can you have a look see if I've mucked up somewhere? For reference, Jeff's branch is here: https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git ceph-fscache-iter-experimental The situation is that the fscache_cookie struct has a member called 'stage' that indicates the state of a cookie representing a data file. As a cache object is set up, it goes through the LOOKING_UP stage (3) whilst the disk is being queried to the ACTIVE stage (5) when the backing file has been opened - at which point we can do I/O on it. Each time the stage is changed, we call wake_up_var() to ping any waiters on the variable. Waiters use wait_var_event() and co. to wait on it. We've added some tracepoints. There's a function which begins an operation on the cache. It waits for the cache to get into the appropriate stage as part of that. Around the bit doing the wait in we have: int __fscache_begin_operation(...) { ... wait_and_validate: spin_unlock(&cookie->lock); cookie->trace = true; trace_fscache_stage_wait(cookie, stage, fscache_cookie_stage_wait_begin_op); timeo = wait_var_event_timeout(&cookie->stage, cookie_stage_changed(cookie, stage), 10*HZ); if (timeo <= 1) { pr_warn("%s: cookie stage change wait timed out: cookie->stage=%u stage=%u", __func__, READ_ONCE(cookie->stage), stage); fscache_print_cookie(cookie, 'O'); trace_fscache_stage_wait(cookie, stage, fscache_cookie_stage_wait_fail); } else { trace_fscache_stage_wait(cookie, stage, fscache_cookie_stage_waited); } cookie->trace = false; goto again; ... } and cookie_stage_changed() looks like: static bool cookie_stage_changed(struct fscache_cookie *cookie, enum fscache_cookie_stage stage) { bool changed = (READ_ONCE(cookie->stage) != stage); if (!changed) trace_fscache_stage_wait(cookie, stage, fscache_cookie_stage_waiting); return changed; } At the point the wakeup happens, we have: void fscache_set_cookie_stage(struct fscache_cookie *cookie, enum fscache_cookie_stage stage) { bool changed = false, trace; spin_lock(&cookie->lock); switch (cookie->stage) { case FSCACHE_COOKIE_STAGE_INDEX: case FSCACHE_COOKIE_STAGE_WITHDRAWING: case FSCACHE_COOKIE_STAGE_RELINQUISHING: break; default: trace = cookie->trace; if (trace) trace_fscache_stage_set(cookie, stage, fscache_cookie_stage_set_general); cookie->stage = stage; changed = true; break; } spin_unlock(&cookie->lock); if (changed) { wake_up_var(&cookie->stage); if (trace) trace_fscache_stage_set(cookie, stage, fscache_cookie_stage_woken); } } In dmesg, we see: [ 2977.269392] FS-Cache: __fscache_begin_operation: cookie stage change wait timed out: cookie->stage=5 stage=3 [ 2977.269398] FS-Cache: O-cookie c=000dd7d9 [p=000dce05 fl=0 nc=0 na=2] [ 2977.276673] FS-Cache: O-cookie d=CEPH.in [ 2977.277796] FS-Cache: O-cookie o=201635 [ 2977.278552] FS-Cache: O-key=[16] '352b930000010000feffffffffffffff' In the trace log, we see: <...>-425461 [015] .... 2967.107716: fscache_stage_wait: c=000dd7d9 WAIT beginop now=3 arg=3 kfsc/7-896 [005] .... 2967.107717: fscache_stage_set: c=000dd7d9 SET general now=3 arg=5 <...>-425461 [015] .... 2967.107718: fscache_stage_wait: c=000dd7d9 WAITING now=3 arg=3 <...>-425461 [015] .... 2967.107718: fscache_stage_wait: c=000dd7d9 WAITING now=5 arg=3 kfsc/7-896 [005] .... 2967.107719: fscache_stage_set: c=000dd7d9 WOKEN now=5 arg=5 <...>-425461 [015] .N.. 2977.452925: fscache_stage_wait: c=000dd7d9 WAIT failed now=5 arg=3 So we can see the wait beginning, with the stage at LOOKING_UP (3). Note that we're actually waiting for stage!=3. Next, on cpu 5, we're about to set the stage ("SET" line), and then we've completed the wakeup 2uS later ("WOKEN" line). Whilst the stage-change is happening, the wait condition check happens twice in quick succession (two "WAITING" lines). The first is presumably right at the beginning of wait_var_event() and the second is inside the wait loop. In the first check, the stage is still 3, but in the second call of the condition function, it reads it as 3 (still LOOKING_UP), and so meets the condition to do the tracepoint - which then sees it as 5 (ACTIVE), but this doesn't affect the result of the condition function. [*] Note that in the traceline, cookie->stage as read inside the tracepoint is the "now" number and the second is fed in by the caller. wait_var_event()/wait_var_event_timeout() then goes to sleep - which it shouldn't, because either the task state should've been cleared, or the condition function should prevent it. Any thoughts on how to debug this further? Or if I've just done something silly? Also, do I need to use WRITE_ONCE() or smp_store_release() when setting cookie->store, since I'm reading it outside of the spinlock, albeit through READ_ONCE(). Thanks, David -- Linux-cachefs mailing list Linux-cachefs@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cachefs