On 2024/5/31 14:17, Christoph Hellwig wrote: > On Wed, May 29, 2024 at 04:50:02PM +0800, Chengming Zhou wrote: >> Yes, because we use list_move_tail() in the flush sequences. Maybe we can >> just use list_add_tail() so we don't need the queuelist initialized. It >> should be ok since rq can't be on any list when PREFLUSH or POSTFLUSH, >> so there isn't any move actually. > > Sounds good. Ok, I could send a fix that changes to use list_add_tail() later. > >> But now I'm concerned that rq->queuelist maybe changed by driver after >> request end? > > How could the driver change it? I don't know much about drivers. Normally, they will detach rq->queuelist from their internal list and do blk_mq_end_request(), in which we reuse this queuelist to add rq to the post-flush list. Strictly speaking, that rq's ownership still belongs to the drivers until they call blk_mq_free_request(), right? So I'm not sure if the drivers would touch rq->queuelist after blk_mq_end_request(). If the drivers don't have such behaviors, then we are good. > >>> Also, just out of interest: Can you estimate whether this issue is >>> specific to software RAID setups, or could similar NULL pointer >>> dereferences also happen in setups without software RAID? >> >> I think it can also happen without software RAID. > > Seems to be about batch allocation. So you either need a plug in > the stacking device, or io_uring. I guess people aren't using the > io_uring high performance options on devices with a write cache > all that much, as that should immediately reproduce the problem. >