Hi, we’re using multipath with queue mode bio and we’ve run into what seem to be a regression introduce by commit dbaf971c9cdf10843071a60dcafc1aaab3162354 in 5.5 (which was also back ported to 5.4). This happens at the time the multipath device is created. We’re running on a Cisco box with an mpt3sas hba controller, SAS drives, the kernel is a vanilla kernel from kernel.org with a few patches in completely unrelated part of the kernel code, multipath 0.8.3 on a Debian Buster. We’ve initially bisected the issue on the v5.4.x branch down to commit 7e53ea4a1641c463d5369f800734920f1dac56c2 and then we also verified that a v5.5.9 build without commit dbaf971c9cdf10843071a60dcafc1aaab3162354 did not exhibit the bug while it does with it. When booting our test platform with this commit included, we see the a lot fo kernel WARNING traces like the following one and the multipath devices are unusable: [ 34.559589] ------------[ cut here ]------------ [ 34.559600] WARNING: CPU: 3 PID: 1432 at kernel/workqueue.c:1622 __queue_delayed_work+0x70/0x90 [ 34.559600] Modules linked in: dm_service_time nvmet_tcp mlx5_ib mlx5_core ib_uverbs pci_hyperv_intf nvmet_rdma rdma_cm iw_cm ib_cm ib_core nvmet nvme_fabrics iscsi_target_mod target_core_iblock target_core_mod configfs mpt3sas raid_class scsi_transport_sas dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul ipmi_ssif crc32_pclmul ghash_clmulni_intel snd_pcm snd_timer snd soundcore aesni_intel mei_me iTCO_wdt crypto_simd cryptd input_leds joydev glue_helper mei iTCO_vendor_support pcspkr ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter mac_hid ip_tables x_tables autofs4 usb_storage hid_generic usbkbd usbmouse usbhid hid fnic libfcoe ahci mxm_wmi libfc libahci lpc_ich enic scsi_transport_fc wmi [ 34.559634] CPU: 3 PID: 1432 Comm: systemd-udevd Not tainted 5.5.8 #98 [ 34.559634] Hardware name: Cisco Systems Inc UCSC-C3K-M4SRB/UCSC-C3K-M4SRB, BIOS C3X60M4.4.0.2f.0.1113190831 11/13/2019 [ 34.559637] RIP: 0010:__queue_delayed_work+0x70/0x90 [ 34.559638] Code: 41 81 f8 00 02 00 00 48 89 4a 30 75 2a e9 c8 cd 06 00 44 89 c7 e9 80 fb ff ff 0f 0b eb cb 0f 0b 48 81 7a 38 40 a0 0a b2 74 ab <0f> 0b 48 83 7a 28 00 74 a9 0f 0b eb a5 44 89 c6 e9 ab bc 06 00 90 [ 34.559639] RSP: 0018:ffffb6e88e9b3830 EFLAGS: 00010007 [ 34.559640] RAX: 0000000000000002 RBX: 0000000000000002 RCX: 0000000000000000 [ 34.559641] RDX: ffff9e9c38006c30 RSI: ffff9e9c33933c00 RDI: ffff9e9c38006c50 [ 34.559642] RBP: ffff9e9c33828e00 R08: 0000000000000200 R09: ffff9e7c326cc458 [ 34.559643] R10: 0000000000000000 R11: 01fffffffffffffe R12: 0000000000000000 [ 34.559643] R13: ffff9e7c050400b0 R14: ffff9e7c05040000 R15: 0000000000000001 [ 34.559644] FS: 00007ff73a5cbd40(0000) GS:ffff9e7c3f6c0000(0000) knlGS:0000000000000000 [ 34.559645] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 34.559645] CR2: 00007ffdf1e8ea48 CR3: 0000001ff447a001 CR4: 00000000003606e0 [ 34.559646] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 34.559646] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 34.559647] Call Trace: [ 34.559651] queue_delayed_work_on+0x24/0x40 [ 34.559656] __pg_init_all_paths+0x75/0xc0 [dm_multipath] [ 34.559658] pg_init_all_paths+0x23/0x40 [dm_multipath] [ 34.559660] __multipath_map_bio+0x1b5/0x230 [dm_multipath] [ 34.559664] __map_bio+0x42/0x170 [ 34.559666] __split_and_process_non_flush+0x132/0x1d0 [ 34.559669] __split_and_process_bio+0x94/0x240 [ 34.559672] ? blk_throtl_bio+0x141/0xbf0 [ 34.559674] dm_process_bio+0x117/0x230 [ 34.559678] ? generic_make_request_checks+0x23a/0x5c0 [ 34.559680] dm_make_request+0x3b/0xb0 [ 34.559681] generic_make_request+0x11f/0x2e0 [ 34.559683] ? submit_bio+0x72/0x140 [ 34.559685] submit_bio+0x72/0x140 [ 34.559689] mpage_readpages+0x154/0x190 [ 34.559692] ? bdev_evict_inode+0xf0/0xf0 [ 34.559697] read_pages+0x71/0x1a0 [ 34.559700] ? __do_page_cache_readahead+0x199/0x1b0 [ 34.559701] __do_page_cache_readahead+0x199/0x1b0 [ 34.559703] force_page_cache_readahead+0xb7/0xe0 [ 34.559705] generic_file_read_iter+0x7f3/0xbf0 [ 34.559708] ? _copy_to_user+0x22/0x30 [ 34.559713] ? cp_new_stat+0x154/0x190 [ 34.559716] new_sync_read+0x11b/0x1b0 [ 34.559718] vfs_read+0x90/0x130 [ 34.559720] ksys_read+0x5c/0xe0 [ 34.559725] do_syscall_64+0x52/0x1a0 [ 34.559730] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 34.559731] RIP: 0033:0x7ff73adac461 [ 34.559733] Code: fe ff ff 50 48 8d 3d fe d0 09 00 e8 e9 03 02 00 66 0f 1f 84 00 00 00 00 00 48 8d 05 99 62 0d 00 8b 00 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 41 54 49 89 d4 55 48 [ 34.559734] RSP: 002b:00007ffdf1e90ba8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 34.559735] RAX: ffffffffffffffda RBX: 0000557789799f50 RCX: 00007ff73adac461 [ 34.559735] RDX: 0000000000000040 RSI: 000055778978b588 RDI: 0000000000000006 [ 34.559736] RBP: 0000557789799fa0 R08: 000055778978b560 R09: 00007ff73ae7e330 [ 34.559737] R10: 000055778977d010 R11: 0000000000000246 R12: 0000057541e80000 [ 34.559737] R13: 0000000000000040 R14: 000055778978b578 R15: 000055778978b560 [ 34.559738] ---[ end trace 865597b9b72c7dc2 ]— Let me know if there is anything else that would help understand what goes on best, Jean-Francois Remy
|
-- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel