Re: [bug report][bisected] WARNING: CPU: 4 PID: 10482 at block/mq-deadline.c:597 dd_exit_sched+0x198/0x1d0'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 7, 2021 at 6:40 PM Halil Pasic <pasic@xxxxxxxxxxxxx> wrote:
>
> Hi Jens!
>
> What is the status of this?

Hi Halil
This issue should be already fixed, here is the commit:

commit 2b504bd4841bccbf3eb83c1fec229b65956ad8ad (tag: block-5.16-2021-11-19)
Author: Ming Lei <ming.lei@xxxxxxxxxx>
Date:   Thu Nov 18 23:30:41 2021 +0800

    blk-mq: don't insert FUA request with data into scheduler queue

    We never insert flush request into scheduler queue before.

    Recently commit d92ca9d8348f ("blk-mq: don't handle non-flush requests in
    blk_insert_flush") tries to handle FUA data request as normal request.
    This way has caused warning[1] in mq-deadline dd_exit_sched() or io hang in
    case of kyber since RQF_ELVPRIV isn't set for flush request, then
    ->finish_request won't be called.

    Fix the issue by inserting FUA data request with
blk_mq_request_bypass_insert()
    when the device supports FUA, just like what we did before.

>
> I see just one fix for e0d78afeb8d1 ("block: fix too broad elevator check in
> blk_mq_free_request()") which pre-dates this bug report.
>
> We see something similar (i.e. the exact same warning) in our CI occasionally,
> when the nbd module is unloaded. Unfortunately I can't trigger it reliably and
> frequently enough to confirm that the problem is caused by the aforementioned
> commit. All I know is that we occasionally do hit the same warning.
>
> Thanks in advance!
>
> Regards,
> Halil
>
> If your interested, here are the  relevant kernel messages we observed:
>
> [ 2697.795977] block nbd0: shutting down sockets
> [ 2697.949807] ------------[ cut here ]------------
> [ 2697.949816] statistics for priority 1: i 2736 m 0 d 2739 c 2735
> [ 2697.949839] WARNING: CPU: 5 PID: 163229 at block/mq-deadline.c:597 dd_exit_sched+0x118/0x138
> [ 2697.949849] Modules linked in: nbd(-) crc32_generic algif_hash dm_mirror dm_region_hash dm_log algif_skcipher af_alg paes_s390 dm_crypt encrypted_keys loop lcs ctcm fsm kvm binfmt_misc sunrpc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua zfcp scsi_transport_fc mlx5_ib ib_uverbs ib_core s390_trng vfio_ccw mdev vfio_iommu_type1 zcrypt_cex4 vfio eadm_sch sch_fq_codel configfs ip_tables x_tables ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 mlx5_core sha1_s390 sha_common nvme nvme_core pkey zcrypt rng_core autofs4 [last unloaded: trace_printk]
> [ 2697.949904] CPU: 5 PID: 163229 Comm: modprobe Not tainted 5.16.0-20211122.rc1.git0.b2753a24042f.300.fc34.s390x #1
> [ 2697.949907] Hardware name: IBM 8561 T01 703 (LPAR)
> [ 2697.949908] Krnl PSW : 0704c00180000000 00000002eac7f4f4 (dd_exit_sched+0x11c/0x138)
> [ 2697.949912]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [ 2697.949915] Krnl GPRS: 00000000ffffffea 0000000200000027 0000000000000033 00000000fffeffff
> [ 2697.949917]            00000002f6cfc000 0000038000000001 0000000000000ab3 0000000100000000
> [ 2697.949919]            000000025d5caf48 000000025d5cae00 0000000000000001 000000025d5cae80
> [ 2697.949920]            00000002b31f2100 000000025d5caf4a 00000002eac7f4f0 0000038007167af8
> [ 2697.949927] Krnl Code: 00000002eac7f4e4: e310f0a00024        stg     %r1,160(%r15)
>                           00000002eac7f4ea: c0e50025381f        brasl   %r14,00000002eb126528
>                          #00000002eac7f4f0: af000000            mc      0,0
>                          >00000002eac7f4f4: a7f4ffca            brc     15,00000002eac7f488
>                           00000002eac7f4f8: b9040028            lgr     %r2,%r8
>                           00000002eac7f4fc: c0e50008acda        brasl   %r14,00000002ead94eb0
>                           00000002eac7f502: a7f4ffb5            brc     15,00000002eac7f46c
>                           00000002eac7f506: af000000            mc      0,0
> [ 2697.949941] Call Trace:
> [ 2697.949943]  [<00000002eac7f4f4>] dd_exit_sched+0x11c/0x138
> [ 2697.949946] ([<00000002eac7f4f0>] dd_exit_sched+0x118/0x138)
> [ 2697.949948]  [<00000002eac5fddc>] blk_mq_exit_sched+0xb4/0xd8
> [ 2697.949951]  [<00000002eac44e38>] __elevator_exit+0x40/0x60
> [ 2697.949955]  [<00000002eac4abda>] blk_release_queue+0xc2/0x168
> [ 2697.949958]  [<00000002ead7f78a>] kobject_cleanup+0x5a/0x180
> [ 2697.949961]  [<00000002eac62270>] disk_release+0x70/0x90
> [ 2697.949963]  [<00000002eae04b38>] device_release+0x48/0xb0
> [ 2697.949968]  [<00000002ead7f78a>] kobject_cleanup+0x5a/0x180
> [ 2697.949970]  [<000003ff806b724a>] nbd_dev_remove+0x3a/0x90 [nbd]
> [ 2697.949976]  [<000003ff806bc1c2>] nbd_cleanup+0xda/0x120 [nbd]
> [ 2697.949980]  [<00000002ea67838a>] __do_sys_delete_module+0x1a2/0x268
> [ 2697.949984]  [<00000002eb1369ec>] __do_syscall+0x1d4/0x200
> [ 2697.949987]  [<00000002eb143f32>] system_call+0x82/0xb0
> [ 2697.949990] Last Breaking-Event-Address:
> [ 2697.949991]  [<00000002eb126588>] __warn_printk+0x60/0x68
> [ 2697.949996] Kernel panic - not syncing: panic_on_warn set ...
>


-- 
Best Regards,
  Yi Zhang





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux