On 01/09/18 00:57 -0800, Liran Alon wrote: > > ----- haozhong.zhang@xxxxxxxxx wrote: > > > On 01/07/18 00:26 -0700, Ross Zwisler wrote: > > > On Wed, Aug 23, 2017 at 10:21 PM, Wanpeng Li <kernellwp@xxxxxxxxx> > > wrote: > > > > From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx> > > > > > > > > vmx_complete_interrupts() assumes that the exception is always > > injected, > > > > so it would be dropped by kvm_clear_exception_queue(). This patch > > separates > > > > exception.pending from exception.injected, exception.inject > > represents the > > > > exception is injected or the exception should be reinjected due to > > vmexit > > > > occurs during event delivery in VMX non-root operation. > > exception.pending > > > > represents the exception is queued and will be cleared when > > injecting the > > > > exception to the guest. So exception.pending and > > exception.injected can > > > > cooperate to guarantee exception will not be lost. > > > > > > > > Reported-by: Radim Krčmář <rkrcmar@xxxxxxxxxx> > > > > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> > > > > Cc: Radim Krčmář <rkrcmar@xxxxxxxxxx> > > > > Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx> > > > > --- > > > > > > I'm seeing a regression in my QEMU based NVDIMM testing system, and > > I > > > bisected it to this commit. > > > > > > The behavior I'm seeing is that heavy I/O to simulated NVDIMMs in > > > multiple virtual machines causes the QEMU guests to receive double > > > faults, crashing them. Here's an example backtrace: > > > > > > [ 1042.653816] PANIC: double fault, error_code: 0x0 > > > [ 1042.654398] CPU: 2 PID: 30257 Comm: fsstress Not tainted > > 4.15.0-rc5 #1 > > > [ 1042.655169] Hardware name: QEMU Standard PC (i440FX + PIIX, > > 1996), > > > BIOS 1.10.2-2.fc27 04/01/2014 > > > [ 1042.656121] RIP: 0010:memcpy_flushcache+0x4d/0x180 > > > [ 1042.656631] RSP: 0018:ffffac098c7d3808 EFLAGS: 00010286 > > > [ 1042.657245] RAX: ffffac0d18ca8000 RBX: 0000000000000fe0 RCX: > > ffffac0d18ca8000 > > > [ 1042.658085] RDX: ffff921aaa5df000 RSI: ffff921aaa5e0000 RDI: > > 000019f26e6c9000 > > > [ 1042.658802] RBP: 0000000000001000 R08: 0000000000000000 R09: > > 0000000000000000 > > > [ 1042.659503] R10: 0000000000000000 R11: 0000000000000000 R12: > > ffff921aaa5df020 > > > [ 1042.660306] R13: ffffac0d18ca8000 R14: fffff4c102a977c0 R15: > > 0000000000001000 > > > [ 1042.661132] FS: 00007f71530b90c0(0000) > > GS:ffff921b3b280000(0000) > > > knlGS:0000000000000000 > > > [ 1042.662051] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 1042.662528] CR2: 0000000001156002 CR3: 000000012a936000 CR4: > > 00000000000006e0 > > > [ 1042.663093] Call Trace: > > > [ 1042.663329] write_pmem+0x6c/0xa0 [nd_pmem] > > > [ 1042.663668] pmem_do_bvec+0x15f/0x330 [nd_pmem] > > > [ 1042.664056] ? kmem_alloc+0x61/0xe0 [xfs] > > > [ 1042.664393] pmem_make_request+0xdd/0x220 [nd_pmem] > > > [ 1042.664781] generic_make_request+0x11f/0x300 > > > [ 1042.665135] ? submit_bio+0x6c/0x140 > > > [ 1042.665436] submit_bio+0x6c/0x140 > > > [ 1042.665754] ? next_bio+0x18/0x40 > > > [ 1042.666025] ? _cond_resched+0x15/0x40 > > > [ 1042.666341] submit_bio_wait+0x53/0x80 > > > [ 1042.666804] blkdev_issue_zeroout+0xdc/0x210 > > > [ 1042.667336] ? __dax_zero_page_range+0xb5/0x140 > > > [ 1042.667810] __dax_zero_page_range+0xb5/0x140 > > > [ 1042.668197] ? xfs_file_iomap_begin+0x2bd/0x8e0 [xfs] > > > [ 1042.668611] iomap_zero_range_actor+0x7c/0x1b0 > > > [ 1042.668974] ? iomap_write_actor+0x170/0x170 > > > [ 1042.669318] iomap_apply+0xa4/0x110 > > > [ 1042.669616] ? iomap_write_actor+0x170/0x170 > > > [ 1042.669958] iomap_zero_range+0x52/0x80 > > > [ 1042.670255] ? iomap_write_actor+0x170/0x170 > > > [ 1042.670616] xfs_setattr_size+0xd4/0x330 [xfs] > > > [ 1042.670995] xfs_ioc_space+0x27e/0x2f0 [xfs] > > > [ 1042.671332] ? terminate_walk+0x87/0xf0 > > > [ 1042.671662] xfs_file_ioctl+0x862/0xa40 [xfs] > > > [ 1042.672035] ? _copy_to_user+0x22/0x30 > > > [ 1042.672346] ? cp_new_stat+0x150/0x180 > > > [ 1042.672663] do_vfs_ioctl+0xa1/0x610 > > > [ 1042.672960] ? SYSC_newfstat+0x3c/0x60 > > > [ 1042.673264] SyS_ioctl+0x74/0x80 > > > [ 1042.673661] entry_SYSCALL_64_fastpath+0x1a/0x7d > > > [ 1042.674239] RIP: 0033:0x7f71525a2dc7 > > > [ 1042.674681] RSP: 002b:00007ffef97aa778 EFLAGS: 00000246 > > ORIG_RAX: > > > 0000000000000010 > > > [ 1042.675664] RAX: ffffffffffffffda RBX: 00000000000112bc RCX: > > 00007f71525a2dc7 > > > [ 1042.676592] RDX: 00007ffef97aa7a0 RSI: 0000000040305825 RDI: > > 0000000000000003 > > > [ 1042.677520] RBP: 0000000000000009 R08: 0000000000000045 R09: > > 00007ffef97aa78c > > > [ 1042.678442] R10: 0000000000000000 R11: 0000000000000246 R12: > > 0000000000000003 > > > [ 1042.679330] R13: 0000000000019e38 R14: 00000000000fcca7 R15: > > 0000000000000016 > > > [ 1042.680216] Code: 48 8d 5d e0 4c 8d 62 20 48 89 cf 48 29 d7 48 > > 89 > > > de 48 83 e6 e0 4c 01 e6 48 8d 04 17 4c 8b 02 4c 8b 4a 08 4c 8b 52 > > 10 > > > 4c 8b 5a 18 <4c> 0f c3 00 4c 0f c3 48 08 4c 0f c3 50 10 4c 0f c3 58 > > 18 > > > 48 83 > > > > > > This appears to be independent of both the guest kernel version > > (this > > > backtrace has v4.15.0-rc5, but I've seen it with other kernels) as > > > well as independent of the host QMEU version (mine happens to be > > > qemu-2.10.1-2.fc27 in Fedora 27). > > > > > > The new behavior is due to this commit being present in the host OS > > > kernel. Prior to this commit I could fire up 4 VMs and run > > xfstests > > > on my simulated NVDIMMs, but after this commit such testing results > > in > > > multiple of my VMs crashing almost immediately. > > > > > > Reproduction is very simple, at least on my development box. All > > you > > > need are a pair of VMs (I just did it with clean installs of Fedora > > > 27) with NVDIMMs. Here's a sample QEMU command to get one of > > these: > > > > > > # qemu-system-x86_64 /home/rzwisler/vms/Fedora27.qcow2 -m > > > 4G,slots=3,maxmem=512G -smp 12 -machine pc,accel=kvm,nvdimm > > > -enable-kvm -object > > > > > memory-backend-file,id=mem1,share,mem-path=/home/rzwisler/nvdimms/nvdimm-1,size=17G > > > -device nvdimm,memdev=mem1,id=nv1 > > > > > > In my setup my NVDIMMs backing files > > (/home/rzwisler/nvdimms/nvdimm-1) > > > are being created on a filesystem on an SSD. > > > > > > After these two qemu guests are up, run write I/Os to the resulting > > > /dev/pmem0 devices. I've done this with xfstests and fio to get > > the > > > error, but the simplest way is just: > > > > > > # dd if=/dev/zero of=/dev/pmem0 > > > > > > The double fault should happen in under a minute, definitely before > > > the DDs run out of space on their /dev/pmem0 devices. > > > > > > I've reproduced this on multiple development boxes, so I'm pretty > > sure > > > it's not related to a flakey hardware setup. > > > > > > > Thanks for reporting this issue. I'll look into this issue. > > > > Haozhong > > I'm pretty confident this is unrelated to your scenario but I have fixed a regression bug on this commit which wasn't yet merged ("KVM: nVMX: Fix bug of injecting L2 exception into L1"): > https://www.spinics.net/lists/kvm/msg159062.html > > But from my understanding it doesn't look like you are running any nested VMs here right? > In that case, ignore my reply. > Just wanted you to notice this in case I misunderstand something in your test setup. > It's not nested VM. After a little debug work, I find the guest double fault is injected by KVM, because KVM finds a pending page fault while it's going to inject another one. I'm still investigating. Thanks, Haozhong