Christoph Hellwig <hch@xxxxxx> writes: > On Mon, Aug 16, 2021 at 05:10:41PM +0800, Hillf Danton wrote: >> Remove and free all qos callbacks added, with cb->timer deleted in >> blk_stat_remove_callback(). >> >> only for thoughts. >> >> +++ x/block/blk-sysfs.c >> @@ -800,9 +800,7 @@ static void blk_release_queue(struct kob >> >> might_sleep(); >> >> - if (test_bit(QUEUE_FLAG_POLL_STATS, &q->queue_flags)) >> - blk_stat_remove_callback(q, q->poll_cb); >> - blk_stat_free_callback(q->poll_cb); >> + rq_qos_exit(q); > > rq_qos_exit is already called in blk_cleanup_queue, and the blk-mq > pollig doesn't even use the qos framework. So I'm not sure what this > is supposed to help. I'm seeing a similar crash in our CI: [ 464.072042] nbd0: detected capacity change from 0 to 2097152 [ 464.092297] nbd0: p1 [ 464.244242] EXT4-fs (nbd0p1): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none. [ 468.266306] block nbd0: NBD_DISCONNECT [ 468.266318] block nbd0: Disconnected due to user request. [ 468.266320] block nbd0: shutting down sockets [ 468.291814] Unable to handle kernel pointer dereference in virtual kernel address space [ 468.291817] Failing address: 000002aa264a7000 TEID: 000002aa264a7803 [ 468.291819] Fault in home space mode while using kernel ASCE. [ 468.291822] AS:0000000159c84007 R3:0000000000000024 [ 468.291843] Oops: 003b ilc:3 [#1] SMP [ 468.291846] Modules linked in: nbd(E-) xt_CHECKSUM(E) xt_MASQUERADE(E) xt_conntrack(E) ipt_REJECT(E) xt_tcpudp(E) nft_compat(E) nf_nat_tftp(E) nft_objref(E) nf_conntrack_tftp(E) nft_counter(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) dm_service_time(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) nf_tables(E) nfnetlink(E) sunrpc(E) zfcp(E) scsi_transport_fc(E) dm_multipath(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) mlx5_ib(E) ib_uverbs(E) ib_core(E) s390_trng(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) vfio(E) zcrypt_cex4(E) eadm_sch(E) sch_fq_codel(E) configfs(E) ip_tables(E) x_tables(E) ghash_s390(E) prng(E) aes_s390(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha512_s390(E) sha256_s390(E) sha1_s390(E) sha_common(E) mlx5_core(E) nvme(E) nvme_core(E) pkey(E) zcrypt(E) rng_core(E) autofs4(E) [ 468.291891] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G E 5.14.0-20210819.rc6.git0.f26c3abc432a.300.fc34.s390x+next #1 [ 468.291894] Hardware name: IBM 8561 T01 703 (LPAR) [ 468.291895] Krnl PSW : 0704c00180000000 0000000158cfe3b6 (wb_timer_fn+0x56/0x538) [ 468.291902] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 [ 468.291905] Krnl GPRS: 0000000000000200 000002aa264a7018 0000000189fc3400 0000000000000000 [ 468.291907] fffffffffffc0000 0000000000000000 00000002f767c000 0000000158cc9420 [ 468.291909] 0000000000000000 0000000189fc3410 00000001e19622a0 0000000138e9a700 [ 468.291911] 0000000080378000 00000002f767c002 0000038000d43ca0 0000038000d43c40 [ 468.291937] Krnl Code: 0000000158cfe3a4: e380b0280004 lg %r8,40(%r11) 0000000158cfe3aa: e31010900004 lg %r1,144(%r1) #0000000158cfe3b0: e31012000004 lg %r1,512(%r1) >0000000158cfe3b6: e36010980004 lg %r6,152(%r1) 0000000158cfe3bc: ec88005e007c cgij %r8,0,8,0000000158cfe478 0000000158cfe3c2: e310b0300002 ltg %r1,48(%r11) 0000000158cfe3c8: a7840058 brc 8,0000000158cfe478 0000000158cfe3cc: c0e5ffce8822 brasl %r14,00000001586cf410 [ 468.291951] Call Trace: [ 468.291953] [<0000000158cfe3b6>] wb_timer_fn+0x56/0x538 [ 468.291956] [<00000001586ca980>] call_timer_fn+0x38/0x178 [ 468.291960] [<00000001586cad58>] __run_timers.part.0+0x298/0x358 [ 468.291962] [<00000001586cae62>] run_timer_softirq+0x4a/0x88 [ 468.291964] [<0000000159149236>] __do_softirq+0x146/0x3c8 [ 468.291967] [<000000015862cbaa>] irq_exit+0xf2/0x120 [ 468.291970] [<000000015913a334>] do_ext_irq+0xd4/0x160 [ 468.291972] [<000000015914769c>] ext_int_handler+0xdc/0x110 [ 468.291974] [<0000000159147826>] psw_idle_exit+0x0/0xa [ 468.291976] ([<00000001585dbfe8>] arch_cpu_idle+0x40/0xd0) [ 468.291978] [<000000015914718a>] default_idle_call+0x42/0x108 [ 468.291980] [<000000015866ab6a>] do_idle+0xd2/0x160 [ 468.291983] [<000000015866adb6>] cpu_startup_entry+0x36/0x40 [ 468.291985] [<00000001585ef74e>] smp_start_secondary+0x86/0x90 [ 468.291987] Last Breaking-Event-Address: [ 468.291989] [<0000038000d43d30>] 0x38000d43d30 [ 468.291992] Kernel panic - not syncing: Fatal exception in interrupt The crash is likely triggered by nbd. wb_timer_fn+0x56 is block/blk-wbt.c: 237 like in the syzbot reported crash. That line was just recently touched, so i wonder whether that's related?