On Thu, 6 Jul 2017 09:39:41 +0000 "Dong, Chuanxiao" <chuanxiao.dong@xxxxxxxxx> wrote: > Hello, > > We met a possible recursive locking issue and seeking a solution for resolving this. The log is looking like below: > > [ 5102.127454] ============================================ > [ 5102.133379] WARNING: possible recursive locking detected > [ 5102.139304] 4.12.0-rc4+ #3 Not tainted > [ 5102.143483] -------------------------------------------- > [ 5102.149407] qemu-system-x86/1620 is trying to acquire lock: > [ 5102.155624] (&container->group_lock){++++++}, at: [<ffffffff817768c6>] vfio_unpin_pages+0x96/0xf0 > [ 5102.165626] > but task is already holding lock: > [ 5102.172134] (&container->group_lock){++++++}, at: [<ffffffff8177728f>] vfio_fops_unl_ioctl+0x5f/0x280 > [ 5102.182522] > other info that might help us debug this: > [ 5102.189806] Possible unsafe locking scenario: > > [ 5102.196411] CPU0 > [ 5102.199136] ---- > [ 5102.201861] lock(&container->group_lock); > [ 5102.206527] lock(&container->group_lock); > [ 5102.211191] > *** DEADLOCK *** > > [ 5102.217796] May be due to missing lock nesting notation > > [ 5102.225370] 3 locks held by qemu-system-x86/1620: > [ 5102.230618] #0: (&container->group_lock){++++++}, at: [<ffffffff8177728f>] vfio_fops_unl_ioctl+0x5f/0x280 > [ 5102.241482] #1: (&(&iommu->notifier)->rwsem){++++..}, at: [<ffffffff810de775>] __blocking_notifier_call_chain+0x35/0x70 > [ 5102.253713] #2: (&vgpu->vdev.cache_lock){+.+...}, at: [<ffffffff8157b007>] intel_vgpu_iommu_notifier+0x77/0x120 > [ 5102.265163] > stack backtrace: > [ 5102.270022] CPU: 5 PID: 1620 Comm: qemu-system-x86 Not tainted 4.12.0-rc4+ #3 > [ 5102.277991] Hardware name: Intel Corporation S1200RP/S1200RP, BIOS S1200RP.86B.03.01.APER.061220151418 06/12/2015 > [ 5102.289445] Call Trace: > [ 5102.292175] dump_stack+0x85/0xc7 > [ 5102.295871] validate_chain.isra.21+0x9da/0xaf0 > [ 5102.300925] __lock_acquire+0x405/0x820 > [ 5102.305202] lock_acquire+0xc7/0x220 > [ 5102.309191] ? vfio_unpin_pages+0x96/0xf0 > [ 5102.313666] down_read+0x2b/0x50 > [ 5102.317259] ? vfio_unpin_pages+0x96/0xf0 > [ 5102.321732] vfio_unpin_pages+0x96/0xf0 > [ 5102.326024] intel_vgpu_iommu_notifier+0xe5/0x120 > [ 5102.331283] notifier_call_chain+0x4a/0x70 > [ 5102.335851] __blocking_notifier_call_chain+0x4d/0x70 > [ 5102.341490] blocking_notifier_call_chain+0x16/0x20 > [ 5102.346935] vfio_iommu_type1_ioctl+0x87b/0x920 > [ 5102.351994] vfio_fops_unl_ioctl+0x81/0x280 > [ 5102.356660] ? __fget+0xf0/0x210 > [ 5102.360261] do_vfs_ioctl+0x93/0x6a0 > [ 5102.364247] ? __fget+0x111/0x210 > [ 5102.367942] SyS_ioctl+0x41/0x70 > [ 5102.371542] entry_SYSCALL_64_fastpath+0x1f/0xbe > > The call stack is: > vfio_fops_unl_ioctl -> vfio_iommu_type1_ioctl -> vfio_dma_do_unmap -> blocking_notifier_call_chain -> intel_vgpu_iommu_notifier -> vfio_unpin_pages. > > The container->group_lock is hold in vfio_fops_unl_ioctl first but then it will be hold again in vfio_unpin_pages. This doesn't make sense to me, but then lockdep splats using don't to me at first. If we're passing through vfio_fops_unl_ioctl() for a VFIO_IOMMU_UNMAP_DMA, then we'll be holding a read-lock on container->group_lock. vfio_unpin_pages() also takes a read-lock on the same. Why is this a problem? We should be able to nest read-locks. > Regarding this, put the vfio_unpin_pages in another thread can resolve > this recursive locking. In this way, vfio_unpin_pages will be > asynchronies with the vfio_dma_do_unmap. Then it is possible to > trigger below kernel panic due to this asynchronies: This is an invalid solution and the code is punishing you for it ;) The user is requesting to unmap pages and we must release those pages before the kernel ioctl returns. As you can see near the BUG_ON hit below, we'll retrigger the blocking notifier call chain 10 times to try to get the page we need released. If each one of those starts a thread, there's no guarantee that any of them will run before we hit our retry limit. The below is completely expected in that case. Thanks, Alex > [ 4468.975091] ------------[ cut here ]------------ > [ 4468.976145] kernel BUG at drivers/vfio/vfio_iommu_type1.c:833! > [ 4468.977193] invalid opcode: 0000 [#1] SMP > [ 4468.978232] Modules linked in: bridge stp llc nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel joydev input_leds crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd hci_uart glue_helper shpchp btbcm cryptd btqca winbond_cir rc_core btintel ipmi_ssif mei_me mei intel_pch_thermal bluetooth acpi_als kfifo_buf industrialio soc_button_array intel_vbtn ipmi_devintf ipmi_msghandler intel_hid ecdh_generic intel_lpss_acpi intel_lpss spidev sparse_keymap acpi_power_meter mac_hid sunrpc parport_pc ppdev lp parport autofs4 kvmgt vfio_mdev mdev vfio_iommu_type1 vfio kvm irqbypass hid_generic usbhid i915 drm_kms_helper igb e1000e syscopyarea sysfillrect sysimgblt dca fb_sys_fops ptp pps_core drm i2c_algo_bit ahci libahci wmi video pinctrl_sunrisepoint > [ 4468.982783] pinctrl_intel i2c_hid hid > [ 4468.983995] CPU: 3 PID: 1549 Comm: qemu-system-x86 Not tainted 4.12.0-rc7+ #1 > [ 4468.985132] Hardware name: Intel Corporation Kabylake Greenlow Refresh UP Server Platform/Zumba Beach Server EV, BIOS KBLSE2R1.R00.0006.B08.1702011304 02/ > [ 4468.986350] task: ffff8cd9afa28000 task.stack: ffffb136c2f68000 > [ 4468.987545] RIP: 0010:vfio_iommu_type1_ioctl+0x894/0x910 [vfio_iommu_type1] > [ 4468.988864] RSP: 0018:ffffb136c2f6bd58 EFLAGS: 00010202 > [ 4468.990106] RAX: 0000000000100000 RBX: 00007f80f55b1410 RCX: 000000007ff00000 > [ 4468.991290] RDX: ffff8cd9af105d00 RSI: ffff8cd9b4835e40 RDI: 000000000000000b > [ 4468.992536] RBP: ffffb136c2f6be30 R08: 0000000000100000 R09: 0000000080000000 > [ 4468.993749] R10: ffffb136c2f6bd30 R11: 000000000000013b R12: 0000000000000000 > [ 4468.994991] R13: ffff8cd9ad813b80 R14: ffff8cd9afa28000 R15: ffffb136c2f6bdc8 > [ 4468.996131] FS: 00007f80f55b2700(0000) GS:ffff8cd9c7d80000(0000) knlGS:0000000000000000 > [ 4468.997305] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 4468.998525] CR2: 000001f016dd2218 CR3: 0000000472bfc000 CR4: 00000000003426e0 > [ 4468.999668] Call Trace: > [ 4469.000916] ? kvm_set_memory_region+0x38/0x60 [kvm] > [ 4469.002072] vfio_fops_unl_ioctl+0x7b/0x260 [vfio] > [ 4469.003220] do_vfs_ioctl+0xa1/0x5d0 > [ 4469.004443] ? SyS_futex+0x7f/0x180 > [ 4469.005567] SyS_ioctl+0x79/0x90 > [ 4469.006661] entry_SYSCALL_64_fastpath+0x1e/0xa9 > > Can you help to check this recursive locking issue? > > Thanks > Chuanxiao >