Re: Potential bug in TDP MMU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 caOn Fri, Dec 10, 2021 at 3:05 PM Ignat Korchagin <ignat@xxxxxxxxxxxxxx> wrote:
>
> I've been trying to figure out the difference between "good" runs and
> "bad" runs of gvisor. So, if I've been running the following bpftrace
> onliner:
>
> $ bpftrace -e 'kprobe:kvm_set_pfn_dirty { @[kstack] = count(); }'
>
> while also executing a single:
>
> $ sudo runsc --platform=kvm --network=none do echo ok
>
> So, for "good" runs the stacks are the following:

The stacks help, thanks for including them. It seems like a race
during do_exit teardown. One thing I notice is that
do_exit->mmput->kvm_mmu_zap_all can interleave with
kvm_vcpu_release->kvm_tdp_mmu_put_root (full call chains omitted),
since the former path allows yielding. But I don't yet see that could
lead to any issues, let alone cause us to encounter a PFN in the EPT
with a zero refcount.

I'll take a closer look next week.

>
> # bpftrace -e 'kprobe:kvm_set_pfn_dirty { @[kstack] = count(); }'
> Attaching 1 probe...
> ^C
>
> @[
>     kvm_set_pfn_dirty+1
>     __handle_changed_spte+2535
>     __tdp_mmu_set_spte+396
>     zap_gfn_range+2229
>     kvm_tdp_mmu_unmap_gfn_range+331
>     kvm_unmap_gfn_range+774
>     kvm_mmu_notifier_invalidate_range_start+743
>     __mmu_notifier_invalidate_range_start+508
>     unmap_vmas+566
>     unmap_region+494
>     __do_munmap+1172
>     __vm_munmap+226
>     __x64_sys_munmap+98
>     do_syscall_64+64
>     entry_SYSCALL_64_after_hwframe+68
> ]: 1
> @[
>     kvm_set_pfn_dirty+1
>     __handle_changed_spte+2535
>     __tdp_mmu_set_spte+396
>     zap_gfn_range+2229
>     kvm_tdp_mmu_unmap_gfn_range+331
>     kvm_unmap_gfn_range+774
>     kvm_mmu_notifier_invalidate_range_start+743
>     __mmu_notifier_invalidate_range_start+508
>     zap_page_range_single+870
>     unmap_mapping_pages+434
>     shmem_fallocate+2518
>     vfs_fallocate+684
>     __x64_sys_fallocate+181
>     do_syscall_64+64
>     entry_SYSCALL_64_after_hwframe+68
> ]: 32
> @[
>     kvm_set_pfn_dirty+1
>     __handle_changed_spte+2535
>     __handle_changed_spte+1746
>     __handle_changed_spte+1746
>     __handle_changed_spte+1746
>     __tdp_mmu_set_spte+396
>     zap_gfn_range+2229
>     __kvm_tdp_mmu_zap_gfn_range+162
>     kvm_tdp_mmu_zap_all+34
>     kvm_mmu_zap_all+518
>     kvm_mmu_notifier_release+83
>     __mmu_notifier_release+420
>     exit_mmap+965
>     mmput+167
>     do_exit+2482
>     do_group_exit+236
>     get_signal+1000
>     arch_do_signal_or_restart+580
>     exit_to_user_mode_prepare+300
>     syscall_exit_to_user_mode+25
>     do_syscall_64+77
>     entry_SYSCALL_64_after_hwframe+68
> ]: 365
>
> For "bad" runs, when I get the warning - I get this:
>
> # bpftrace -e 'kprobe:kvm_set_pfn_dirty { @[kstack] = count(); }'
> Attaching 1 probe...
> ^C
>
> @[
>     kvm_set_pfn_dirty+1
>     __handle_changed_spte+2535
>     __tdp_mmu_set_spte+396
>     zap_gfn_range+2229
>     kvm_tdp_mmu_unmap_gfn_range+331
>     kvm_unmap_gfn_range+774
>     kvm_mmu_notifier_invalidate_range_start+743
>     __mmu_notifier_invalidate_range_start+508
>     unmap_vmas+566
>     unmap_region+494
>     __do_munmap+1172
>     __vm_munmap+226
>     __x64_sys_munmap+98
>     do_syscall_64+64
>     entry_SYSCALL_64_after_hwframe+68
> ]: 1
> @[
>     kvm_set_pfn_dirty+1
>     __handle_changed_spte+2535
>     __handle_changed_spte+1746
>     __handle_changed_spte+1746
>     __handle_changed_spte+1746
>     __tdp_mmu_set_spte+396
>     zap_gfn_range+2229
>     kvm_tdp_mmu_put_root+465
>     mmu_free_root_page+537
>     kvm_mmu_free_roots+629
>     kvm_mmu_unload+28
>     kvm_arch_destroy_vm+510
>     kvm_put_kvm+1017
>     kvm_vcpu_release+78
>     __fput+516
>     task_work_run+206
>     do_exit+2615
>     do_group_exit+236
>     get_signal+1000
>     arch_do_signal_or_restart+580
>     exit_to_user_mode_prepare+300
>     syscall_exit_to_user_mode+25
>     do_syscall_64+77
>     entry_SYSCALL_64_after_hwframe+68
> ]: 2
> @[
>     kvm_set_pfn_dirty+1
>     __handle_changed_spte+2535
>     __tdp_mmu_set_spte+396
>     zap_gfn_range+2229
>     kvm_tdp_mmu_unmap_gfn_range+331
>     kvm_unmap_gfn_range+774
>     kvm_mmu_notifier_invalidate_range_start+743
>     __mmu_notifier_invalidate_range_start+508
>     zap_page_range_single+870
>     unmap_mapping_pages+434
>     shmem_fallocate+2518
>     vfs_fallocate+684
>     __x64_sys_fallocate+181
>     do_syscall_64+64
>     entry_SYSCALL_64_after_hwframe+68
> ]: 32
> @[
>     kvm_set_pfn_dirty+1
>     __handle_changed_spte+2535
>     __handle_changed_spte+1746
>     __handle_changed_spte+1746
>     __handle_changed_spte+1746
>     __tdp_mmu_set_spte+396
>     zap_gfn_range+2229
>     __kvm_tdp_mmu_zap_gfn_range+162
>     kvm_tdp_mmu_zap_all+34
>     kvm_mmu_zap_all+518
>     kvm_mmu_notifier_release+83
>     __mmu_notifier_release+420
>     exit_mmap+965
>     mmput+167
>     do_exit+2482
>     do_group_exit+236
>     get_signal+1000
>     arch_do_signal_or_restart+580
>     exit_to_user_mode_prepare+300
>     syscall_exit_to_user_mode+25
>     do_syscall_64+77
>     entry_SYSCALL_64_after_hwframe+68
> ]: 344
>
> That is, I never get a stack with
> kvm_tdp_mmu_put_root->..->kvm_set_pfn_dirty with a "good" run.
> Perhaps, this may shed some light onto what is going on.
>
> Ignat



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux