[Problem] ndctl command hangs forever when reinitializing pmem device after vm destroyed

"zhangsha (A)" <zhangsha.zhang@xxxxxxxxxx> · Fri, 10 Aug 2018 03:49:23 +0000

Hi, all
I got a D status of the process ndctl command unfortunately,

when I try to reinitialize the dax device after vm destroyed.

The stack of the process ndctl command:
[<ffffffffa02c0029>] dax_pmem_percpu_kill+0x29/0x50 [dax_pmem]
[<ffffffff81454715>] devm_action_release+0x15/0x20
[<ffffffff814552cf>] release_nodes+0x1cf/0x220
[<ffffffff8145542c>] devres_release_all+0x3c/0x60
[<ffffffff81450bea>] __device_release_driver+0x8a/0xf0
[<ffffffff81450c73>] device_release_driver+0x23/0x30
[<ffffffff8144f647>] driver_unbind+0xf7/0x120
[<ffffffff8144ea87>] drv_attr_store+0x27/0x40
[<ffffffff81295ecb>] sysfs_write_file+0xcb/0x140
[<ffffffff812159e0>] vfs_write+0xc0/0x1f0
[<ffffffff8121650f>] SyS_write+0x7f/0xe0
[<ffffffff816c22ef>] system_call_fastpath+0x1c/0x21
[<ffffffffffffffff>] 0xffffffffffffffff

I can reproduce this problem reliably with the following steps:
1) initialize the device: “ndctl create-namespace --mode dax --map=mem -e namespace0.0 –f”
2) create the VM(command as follos), and wait the guestos starting up
   “/usr/bin/qemu-kvm -name guest=suse12sp2-wj,debug-threads=on -machine pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off,nvdimm=on -cpu host,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff -m size=16777216k,slots=4,maxmem=75497472k
 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -numa node,nodeid=0,cpus=0-3,mem=16384 -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/dev/dax0.0,share=yes,size=8587837440,align=2097152 -device nvdimm,node=0,label-size=131072,memdev=memnvdimm0,id=nvdimm0,slot=0
 -uuid 39ce74f4-9cb6-49cf-8890-949864ee1a99 -no-user-config -nodefaults -rtc base=utc -no-hpet -no-shutdown -boot menu=on,strict=on -device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x7 -device pci-bridge,chassis_nr=1,id=pci.2,bus=pci.0,addr=0x8 -device
 pci-bridge,chassis_nr=1,id=pci.3,bus=pci.0,addr=0x9 -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.3,addr=0x1 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x19 -drive file=/Images/zsha/images/EulerOS310.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,aio=threads
 -device virtio-blk-pci,scsi=off,bus=pci.2,addr=0x1,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -k en-us -device cirrus-vga,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device ivshmem,id=ivshmem0,shm=i-00000006.kboxram,size=16m,role=master,bus=pci.0,addr=0x3
 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x1e -device pvpanic -msg timestamp=on -vnc :9”
3) destroy the VM: “kill -15 `pidof qemu-kvm`”
4) reinitialize the device, then the command hangs: “ndctl create-namespace --mode dax --map=mem -e namespace0.0 –f”

I've tested the problem with a CentOS 3.10.0-862 kernel, a Fedora 4.16.x kernel and a upstream 4.18.0-rc6; they all exhibit the same behavior.

By adding some logs, I find that the function gup_pte_range(get_page->get_zone_device_page)

increase the refcount of device dax0.0 to 161 when starting vm.
But function zap_pte_range() get a NULL page by vm_normal_page(),

so the OS can't decrease the refcount to zero when destroying vm.
And because of it, in function dax_pmem_percpu_kill(dax_pmem_percpu_exit),

the function percpu_ref_put() can't step in the brance releasing device,
the function wait_for_completion() will never be finished.

Stack of increasing the refcount of dax0.0:
[<ffffffff81072c90>] gup_pte_range+0x170/0x380
[<ffffffff8107312f>] gup_pud_range+0x12f/0x1e0
[<ffffffff8107339b>] __get_user_pages_fast+0xcb/0x140
[<ffffffffa057695b>] __gfn_to_pfn_memslot+0x46b/0x490 [kvm]
[<ffffffffa0593e2e>] try_async_pf+0x6e/0x2a0 [kvm]
[<ffffffffa0578dd8>] ? kvm_host_page_size+0x88/0x90 [kvm]
[<ffffffffa059b66a>] tdp_page_fault+0x13a/0x280 [kvm]
[<ffffffffa053c663>] ? vmx_vcpu_run+0x2f3/0xa40 [kvm_intel]
[<ffffffffa059570a>] kvm_mmu_page_fault+0x2a/0x140 [kvm]
[<ffffffffa0532346>] handle_ept_violation+0x96/0x170 [kvm_intel]
[<ffffffffa053ab7c>] vmx_handle_exit+0x2bc/0xc40 [kvm_intel]
[<ffffffffa053c66f>] ? vmx_vcpu_run+0x2ff/0xa40 [kvm_intel]
[<ffffffffa053c663>] ? vmx_vcpu_run+0x2f3/0xa40 [kvm_intel]
[<ffffffffa053c66f>] ? vmx_vcpu_run+0x2ff/0xa40 [kvm_intel]
[<ffffffffa053c663>] ? vmx_vcpu_run+0x2f3/0xa40 [kvm_intel]
[<ffffffffa0538ec8>] ? vmx_hwapic_irr_update+0xb8/0xc0 [kvm_intel]
[<ffffffffa0589b21>] vcpu_enter_guest+0x7d1/0x1300 [kvm]
[<ffffffffa05913b8>] kvm_arch_vcpu_ioctl_run+0x328/0x480 [kvm]
[<ffffffffa0577191>] kvm_vcpu_ioctl+0x2b1/0x660 [kvm]
[<ffffffff81229ec8>] do_vfs_ioctl+0x2e8/0x4d0
[<ffffffff8122a151>] SyS_ioctl+0xa1/0xc0
[<ffffffff816c22ef>] system_call_fastpath+0x1c/0x21

Any reply will be appreciated, and thanks for all your help.

B.R.
Sha Zhang