Hi Chris, On Thu, Oct 03, 2019 at 02:01:13PM +0000, Chris Mason wrote: > > > On 3 Oct 2019, at 4:41, Gao Xiang wrote: > > > Hi, > > > > On Thu, Oct 03, 2019 at 04:40:22PM +1000, Dave Chinner wrote: > >> [cc linux-fsdevel, linux-block, tejun ] > >> > >> On Wed, Oct 02, 2019 at 06:52:47PM -0700, Darrick J. Wong wrote: > >>> Hi everyone, > >>> > >>> Does anyone /else/ see this crash in generic/299 on a V4 filesystem > >>> (tho > >>> afaict V5 configs crash too) and a 5.4-rc1 kernel? It seems to pop > >>> up > >>> on generic/299 though only 80% of the time. > >>> > > > > Just a quick glance, I guess there could is a race between (complete > > guess): > > > > > > 160 static void finish_writeback_work(struct bdi_writeback *wb, > > 161 struct wb_writeback_work *work) > > 162 { > > 163 struct wb_completion *done = work->done; > > 164 > > 165 if (work->auto_free) > > 166 kfree(work); > > 167 if (done && atomic_dec_and_test(&done->cnt)) > > > > ^^^ here > > > > 168 wake_up_all(done->waitq); > > 169 } > > > > since new wake_up_all(done->waitq); is completely on-stack, > > if (done && atomic_dec_and_test(&done->cnt)) > > - wake_up_all(&wb->bdi->wb_waitq); > > + wake_up_all(done->waitq); > > } > > > > which could cause use after free if on-stack wb_completion is gone... > > (however previous wb->bdi is solid since it is not on-stack) > > > > see generic on-stack completion which takes a wait_queue spin_lock > > between > > test and wake_up... > > > > If I am wrong, ignore me, hmm... > > It's a good guess ;) Jens should have this queued up already: > > https://lkml.org/lkml/2019/9/23/972 Oh, I didn't notice that, it's great to be already resolved. :) It was not fully guess though, we once had a some similar pattern at the very early stage last year (a given IO balance counter, wait_queue. but completion is too heavy), which resolved in commit 848bd9acdcd0 last year. Therefore I'm experienced with such cases. Just saw mailing list regularly and be of some help here... Sorry about the noise... Thanks, Gao Xiang > > -chris