Re: mmap_assert_write_locked warnings during for vhost_vdpa_fault

Jason Wang <jasowang@xxxxxxxxxx> · Tue, 18 Jun 2024 09:17:53 +0800

On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@xxxxxxxxxx> wrote:
>
> Hi,
>
> After commit ba168b52bf8e "mm: use rwsem assertion macros for
> mmap_lock") was submitted, we started getting a lot of the
> following warnings about a missing mmap write lock during VM boot:
>
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> track_pfn_remap+0x12b/0x130
> Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle ip6table_nat
> iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack xt_MASQUERADE
> nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser
> libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm mlx5_ib
> ib_uverbs ib_core fuse mlx5_core
> CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G        W
> 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> RIP: 0010:track_pfn_remap+0x12b/0x130
> Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> FS:  00007f678d800700(0000) GS:ffff88852c880000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> Call Trace:
>  <TASK>
>  ? __warn+0x78/0x110
>  ? track_pfn_remap+0x12b/0x130
>  ? report_bug+0x16d/0x180
>  ? handle_bug+0x3c/0x60
>  ? exc_invalid_op+0x14/0x70
>  ? asm_exc_invalid_op+0x16/0x20
>  ? track_pfn_remap+0x12b/0x130
>  remap_pfn_range+0x41/0xa0
>  vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
>  __do_fault+0x2f/0xb0
>  __handle_mm_fault+0x13d3/0x2210
>  handle_mm_fault+0xb0/0x260
>  fixup_user_fault+0x77/0x170
>  hva_to_pfn+0x2c5/0x4b0
>  kvm_faultin_pfn+0xd7/0x510
>  kvm_tdp_page_fault+0x111/0x190
>  kvm_mmu_do_page_fault+0x105/0x230
>  kvm_mmu_page_fault+0x7d/0x620
>  ? vmx_deliver_interrupt+0x110/0x190
>  ? __apic_accept_irq+0x16c/0x270
>  ? vmx_vmexit+0x8d/0xc0
>  vmx_handle_exit+0x110/0x640
>  kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
>  kvm_vcpu_ioctl+0x263/0x6a0
>  ? futex_wake+0x81/0x180
>  __x64_sys_ioctl+0x4a7/0x9d0
>  ? __x64_sys_futex+0x73/0x1c0
>  ? kvm_on_user_return+0x86/0x90
>  do_syscall_64+0x4c/0x100
>  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> RIP: 0033:0x7f679186a17b
> Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
>  </TASK>
> ---[ end trace 0000000000000000 ]---
>
> The warnings show up only when the vdpa page-per-vq option is used (doorbell
> mapping to guest).
>
> The issue seems to have existed before, but was visible only with CONFIG_LOCKDEP
> enabled. I tried finding if this was introduced in more recent kernels, but
> stopped after going as far back as 6.5: the issue was still visible there.
>
> The warning is triggered for the following call chain:
> vhost_vdpa_fault()
>  -> remap_pfn_range()
>   -> remap_pfn_range_notrack()
>    -> vm_flags_set()
>     -> vma_start_write()
>      -> __is_vma_write_locked()
>       -> mmap_assert_write_locked()
>
>
> I've been trying to follow how the mm write lock is dropped in the above call
> chain or not taken at all. But I couldn't make much sense of it...

I've also had a glance at vfio_pci_mmap_fault, it seems to do something similar.

> Any ideas of what could have gone wrong here?

Adding Peter for more thought here.

Thanks

>
> Thanks,
> Dragos