Re: [PATCH v4 4/4] blk-flush: reuse rq queuelist in flush state machine

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 28/05/2024 11:09, Chengming Zhou wrote:
> On 2024/5/28 16:42, Friedrich Weber wrote:
>> Hope I did this correctly. With this, the reproducer triggered a BUG
>> pretty quickly, see [0]. If I can provide anything else, just let me know.
> 
> Thanks for your patience, it's correct. Then how about this fix?
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index d98654869615..b2ec5c4c738f 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -485,6 +485,7 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
>         if (data->nr_tags > 1) {
>                 rq = __blk_mq_alloc_requests_batch(data);
>                 if (rq) {
> +                       INIT_LIST_HEAD(&rq->queuelist);
>                         blk_mq_rq_time_init(rq, alloc_time_ns);
>                         return rq;
>                 }
> 

Nice, seems like with this patch applied on top of 6.9, the reproducer
does not trigger crashes anymore for me! Thanks!

To verify that the reproducer hits the new INIT_LIST_HEAD, I added debug
prints before/after:

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4da581f13273..75186bb0d9c9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -485,7 +485,9 @@ static struct request
*__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
        if (data->nr_tags > 1) {
                rq = __blk_mq_alloc_requests_batch(data);
                if (rq) {
+                       pr_warn("before init: list: %p next: %p prev:
%p\n", &rq->queuelist, rq->queuelist.next, rq->queuelist.prev);
                        INIT_LIST_HEAD(&rq->queuelist);
+                       pr_warn("after init: list: %p next: %p prev:
%p\n", &rq->queuelist, rq->queuelist.next, rq->queuelist.prev);
                        blk_mq_rq_time_init(rq, alloc_time_ns);
                        return rq;
                }

And indeed, I see quite some printouts where rq->queuelist.next differs
before/after the request, e.g.

May 28 16:31:25 reproflushfull kernel: before init: list:
000000001e0a144f next: 00000000aaa2e372 prev: 000000001e0a144f
May 28 16:31:25 reproflushfull kernel: after init: list:
000000001e0a144f next: 000000001e0a144f prev: 000000001e0a144f
May 28 16:31:26 reproflushfull kernel: before init: list:
000000001e0a144f next: 00000000aaa2e372 prev: 000000001e0a144f
May 28 16:31:26 reproflushfull kernel: after init: list:
000000001e0a144f next: 000000001e0a144f prev: 000000001e0a144f

I know very little about the block layer, but if I understand the commit
message of the original 81ada09cc25e correctly, it's expected that
queuelist needs to be reinitialized at some places. I'm just a little
confused to see the same pointer 00000000aaa2e372 in two subsequent
"before init" printouts for the same queuelist 000000001e0a144f. Is this
expected too?

Also, just out of interest: Can you estimate whether this issue is
specific to software RAID setups, or could similar NULL pointer
dereferences also happen in setups without software RAID?

Best wishes,

Friedrich





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux