On 4/15/19 5:46 PM, Roman Penyaev wrote: > On 2019-04-13 05:42, Bart Van Assche wrote: >> On 4/9/19 2:08 AM, Bob Liu wrote: >>> void blk_freeze_queue_start(struct request_queue *q) >>> { >>> - int freeze_depth; >>> - >>> - freeze_depth = atomic_inc_return(&q->mq_freeze_depth); >>> - if (freeze_depth == 1) { >>> + mutex_lock(&q->mq_freeze_lock); >>> + if (++q->mq_freeze_depth == 1) { >>> percpu_ref_kill(&q->q_usage_counter); >>> + mutex_unlock(&q->mq_freeze_lock); >>> if (queue_is_mq(q)) >>> blk_mq_run_hw_queues(q, false); >>> + } else { >>> + mutex_unlock(&q->mq_freeze_lock); >>> } >>> } >> Have you considered to move the mutex_unlock() call to the end of the function >> such that there is only one mutex_unlock() call instead of two? In case you >> would be worried about holding the mutex around the code that runs the queue, >> how about changing the blk_mq_run_hw_queues() call such that the queues are >> run async? > > Hi Bart, > > The only purpose of 'mq_freeze_lock' is to avoid race between mq_freeze_depth > variable and the following usage of q_usage_counter percpu ref. I admit that > my original comment is quite unclear, but locked section should be as short > as possible, so returning to your question: better to have two unlock calls > instead of expanding locked critical section. > > Unfortunately I do not have hardware to play again with the issue, but I see > there is a nice candidate for a quick reproduction: null_blk queues with > shared tags. Having several queues with shared tags and a script, which > powers on/off (I mean 'power' entry of configfs of the null_blk) different > null devices from different cpus it is quite possible to trigger the issue. > Random short msdelay() in correct places can help to increase probability to > hit the issue quite fast. > > > But Bob, what is the backtrace of the issue you hit? What is the device? > Conditions to reproduce the issue are quite specific and frankly I did not > find any "naked" (without any locks) calls of blk_mq_freeze/unfreeze sequence, > the only candidate which I found, seems, null_blk (not 100% sure, but worth to > try). > Yes, it can be reproduced with null_blk. But I added a msleep to save time. diff --git a/block/blk-mq.c b/block/blk-mq.c index 9437a5e..875967f1 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -202,6 +202,7 @@ void blk_mq_unfreeze_queue(struct request_queue *q) freeze_depth = atomic_dec_return(&q->mq_freeze_depth); WARN_ON_ONCE(freeze_depth < 0); if (!freeze_depth) { + msleep(1000); percpu_ref_resurrect(&q->q_usage_counter); wake_up_all(&q->mq_freeze_wq); } Below is the backtrace: --- [ 234.604280] ------------[ cut here ]------------ [ 234.604288] percpu_ref_kill_and_confirm called more than once on blk_queue_usage_counter_release! [ 234.604305] WARNING: CPU: 0 PID: 11854 at lib/percpu-refcount.c:336 percpu_ref_kill_and_confirm+0x99/0xb0 [ 234.604306] Modules linked in: null_blk thunderbolt ccm xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc devlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bnep arc4 intel_rapl iwlmvm x86_pkg_temp_thermal intel_powerclamp mac80211 coretemp snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp kvm_intel snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_conexant snd_compress ac97_bus snd_hda_codec_generic snd_pcm_dmaengine crct10dif_pclmul snd_hda_intel crc32_pclmul snd_hda_codec ghash_clmulni_intel iwlwifi snd_hda_core aesni_intel snd_hwdep snd_pcm thinkpad_acpi uvcvideo nvram ledtrig_audio aes_x86_64 crypto_simd btusb cryptd btrtl glue_helper snd_seq_midi btbcm snd_seq_midi_event intel_cstate videobuf2_vmalloc btintel snd_rawmidi cfg80211 bluetooth intel_rapl_perf [ 234.604361] hid_sensor_accel_3d snd_seq hid_sensor_magn_3d videobuf2_memops videobuf2_v4l2 videobuf2_common hid_sensor_rotation rtsx_pci_ms hid_sensor_als wmi_bmof videodev input_leds joydev serio_raw hid_sensor_gyro_3d intel_wmi_thunderbolt hid_sensor_trigger snd_seq_device industrialio_triggered_buffer kfifo_buf snd_timer hid_sensor_iio_common media industrialio mei_me snd memstick ecdh_generic mei ucsi_acpi processor_thermal_device typec_ucsi intel_soc_dts_iosf intel_pch_thermal typec soundcore int3403_thermal int340x_thermal_zone int3400_thermal acpi_thermal_rel mac_hid acpi_pad sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid0 multipath linear hid_sensor_custom hid_sensor_hub intel_ishtp_hid hid_logitech_hidpp i915 kvmgt vfio_mdev mdev vfio_iommu_type1 vfio kvm irqbypass i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm wacom [ 234.604417] hid_logitech_dj rtsx_pci_sdmmc hid_generic e1000e usbhid psmouse hid intel_ish_ipc rtsx_pci intel_ishtp wmi video [ 234.604430] CPU: 0 PID: 11854 Comm: sh Not tainted 5.0.0+ #33 [ 234.604431] Hardware name: LENOVO 20LJS2EV08/20LJS2EV08, BIOS R0SET33W (1.17 ) 07/18/2018 [ 234.604436] RIP: 0010:percpu_ref_kill_and_confirm+0x99/0xb0 [ 234.604439] Code: 00 eb d3 80 3d 76 a4 2a 01 00 75 ab 48 8b 53 10 48 c7 c6 00 fd a6 8f 48 c7 c7 b0 56 d5 8f c6 05 5b a4 2a 01 01 e8 f7 a2 b1 ff <0f> 0b 48 8b 43 08 eb 85 90 90 90 90 90 90 90 90 90 90 90 90 90 90 [ 234.604441] RSP: 0018:ffffa2880ae9fd00 EFLAGS: 00010086 [ 234.604443] RAX: 0000000000000000 RBX: ffff9768b3b815d8 RCX: 0000000000000000 [ 234.604445] RDX: 0000000000000055 RSI: ffffffff905758f5 RDI: 0000000000000046 [ 234.604447] RBP: ffffa2880ae9fd18 R08: fffffffa314c837d R09: ffffffffffffffff [ 234.604449] R10: ffffe3c84fc0a200 R11: 000000000001abb4 R12: 0000000000000246 [ 234.604451] R13: 0000000000000000 R14: ffff9768b0448300 R15: ffff9768b2703600 [ 234.604454] FS: 00007fae2f3e6540(0000) GS:ffff9768d1400000(0000) knlGS:0000000000000000 [ 234.604456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 234.604458] CR2: 000056078102ee08 CR3: 0000000431384002 CR4: 00000000003606f0 [ 234.604460] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 234.604462] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 234.604463] Call Trace: [ 234.604471] blk_freeze_queue_start+0x2d/0x50 [ 234.604475] blk_set_queue_dying+0x17/0x40 [ 234.604479] blk_cleanup_queue+0x26/0xd0 [ 234.604486] null_del_dev+0x59/0x110 [null_blk] [ 234.604491] nullb_device_power_store+0xca/0x100 [null_blk] [ 234.604495] configfs_write_file+0xb9/0x120 [ 234.604500] __vfs_write+0x3a/0x1b0 [ 234.604505] ? apparmor_file_permission+0x1a/0x20 [ 234.604509] ? security_file_permission+0x3b/0xf0 [ 234.604513] ? _cond_resched+0x1a/0x50 [ 234.604516] vfs_write+0xb8/0x1b0 [ 234.604519] ksys_write+0x55/0xc0 [ 234.604523] __x64_sys_write+0x1a/0x20 [ 234.604528] do_syscall_64+0x5a/0x110 [ 234.604532] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 234.604535] RIP: 0033:0x7fae2ef04154 [ 234.604537] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 b1 07 2e 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 41 54 55 49 89 d4 53 48 89 f5 [ 234.604538] RSP: 002b:00007ffddfe68e88 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 234.604540] RAX: ffffffffffffffda RBX: 00005649349e68a0 RCX: 00007fae2ef04154 [ 234.604542] RDX: 0000000000000002 RSI: 00005649349e68a0 RDI: 0000000000000001 [ 234.604543] RBP: 0000000000000002 R08: 0000000000000077 R09: 0000000000000000 [ 234.604544] R10: 00005649349e6010 R11: 0000000000000246 R12: 0000000000000001 [ 234.604545] R13: 0000000000000002 R14: 7fffffffffffffff R15: 00007ffddfe6aed9 [ 234.604548] ---[ end trace d99129291464ebae ]---