Tested against v4.13-rc7. With this patchset it looks like I/O doesn't hang, but once (just once, not each time) I've got the following stacktrace on resume: === [ 55.577173] ata1.00: Security Log not supported [ 55.580690] ata1.00: configured for UDMA/100 [ 55.582257] ------------[ cut here ]------------ [ 55.583924] usb 1-1: reset high-speed USB device number 2 using xhci_hcd [ 55.587489] WARNING: CPU: 3 PID: 646 at lib/percpu-refcount.c:361 percpu_ref_reinit+0x21/0x80 [ 55.590073] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat iTCO_wdt kvm_intel bochs_drm ppdev kvm ttm iTCO_vendor_support drm_kms_helper irqbypass 8139too input_leds drm evdev psmouse led_class pcspkr syscopyarea joydev sysfillrect lpc_ich 8139cp parport_pc sysimgblt mousedev intel_agp i2c_i801 fb_sys_fops mii mac_hid intel_gtt parport qemu_fw_cfg button sch_fq_codel ip_tables x_tables xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c crc32c_generic algif_skcipher af_alg dm_crypt dm_mod dax raid10 md_mod sr_mod cdrom sd_mod hid_generic usbhid hid uhci_hcd crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_rng ahci xhci_pci serio_raw pcbc ehci_pci xhci_hcd rng_core atkbd libps2 libahci ehci_hcd libata aesni_intel aes_x86_64 crypto_simd glue_helper cryptd [ 55.611580] usbcore virtio_pci scsi_mod usb_common virtio_ring virtio i8042 serio [ 55.614305] CPU: 3 PID: 646 Comm: kworker/u8:23 Not tainted 4.13.0-pf1 #1 [ 55.616611] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [ 55.619903] Workqueue: events_unbound async_run_entry_fn [ 55.621888] task: ffff88001b271e00 task.stack: ffffc90000a2c000 [ 55.623674] RIP: 0010:percpu_ref_reinit+0x21/0x80 [ 55.625751] RSP: 0000:ffffc90000a2fdb0 EFLAGS: 00010002 [ 55.628687] RAX: 0000000000000002 RBX: ffff88001dd80768 RCX: ffff88001dd80758 [ 55.631475] RDX: 0000000000000001 RSI: 0000000000000212 RDI: ffffffff81f3e2f0 [ 55.633694] RBP: ffffc90000a2fdc0 R08: 0000000cc61e7800 R09: ffff88001f9929c0 [ 55.637144] R10: ffffffffffec3296 R11: 7fffffffffffffff R12: 0000000000000246 [ 55.642456] R13: ffff88001f410800 R14: ffff88001f414300 R15: 0000000000000000 [ 55.644832] FS: 0000000000000000(0000) GS:ffff88001f980000(0000) knlGS: 0000000000000000 [ 55.647388] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.649608] CR2: 00000000ffffffff CR3: 000000001aa50000 CR4: 00000000001406e0 [ 55.652688] Call Trace: [ 55.654597] blk_unfreeze_queue+0x2f/0x50 [ 55.656794] scsi_device_resume+0x28/0x70 [scsi_mod] [ 55.659059] scsi_dev_type_resume+0x38/0x90 [scsi_mod] [ 55.660875] async_sdev_resume+0x15/0x20 [scsi_mod] [ 55.662564] async_run_entry_fn+0x36/0x150 [ 55.664241] process_one_work+0x1de/0x430 [ 55.666018] worker_thread+0x47/0x3f0 [ 55.667387] kthread+0x125/0x140 [ 55.672740] ? process_one_work+0x430/0x430 [ 55.674971] ? kthread_create_on_node+0x70/0x70 [ 55.677110] ret_from_fork+0x25/0x30 [ 55.679098] Code: 5b 41 5c 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb 48 c7 c7 f0 e2 f3 81 e8 0a de 32 00 49 89 c4 48 8b 43 08 a8 03 75 42 <0f> ff 48 83 63 08 fd 65 ff 05 31 7d cc 7e 48 8b 53 08 f6 c2 03 [ 55.684822] ---[ end trace dbbf5aed3cf35331 ]--- [ 55.714306] PM: resume of devices complete after 500.175 msecs [ 55.717299] OOM killer enabled. === Here: === 355 void percpu_ref_reinit(struct percpu_ref *ref) 356 { 357 unsigned long flags; 358 359 spin_lock_irqsave(&percpu_ref_switch_lock, flags); 360 361 WARN_ON_ONCE(!percpu_ref_is_zero(ref)); // <-- 362 363 ref->percpu_count_ptr &= ~__PERCPU_REF_DEAD; 364 percpu_ref_get(ref); 365 __percpu_ref_switch_mode(ref, NULL); 366 367 spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); 368 } === On čtvrtek 31. srpna 2017 19:38:34 CEST Ming Lei wrote: > On Thu, Aug 31, 2017 at 07:34:06PM +0200, Oleksandr Natalenko wrote: > > Since I'm in CC, does this series aim to replace 2 patches I've tested > > before: > > > > blk-mq: add requests in the tail of hctx->dispatch > > blk-mq: align to legacy's implementation of blk_execute_rq > > > > ? > > Yeah, this solution is more generic, and the old one in above > two patches may run into the same deadlock inevitably. > > Oleksandr, could you test this patchset and provide the feedback? > > BTW, it fixes the I/O hang in my raid10 test, but I just write > 'devices' to pm_test. > > Thanks!