Re: [PATCH v2 11/27] KVM: x86/mmu: Zap only the relevant pages when removing a memslot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 20 Aug 2019 14:02:45 -0700
Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote:

> On Tue, Aug 20, 2019 at 02:42:04PM -0600, Alex Williamson wrote:
> > On Tue, 20 Aug 2019 13:03:19 -0700
> > Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote:  
> > > All that being said, it doesn't explain why gfns like 0xfec00 and 0xfee00
> > > were sensitive to (lack of) zapping.  My theory is that zapping what were
> > > effectively random-but-interesting shadow pages cleaned things up enough
> > > to avoid noticeable badness.
> > > 
> > > 
> > > Alex,
> > > 
> > > Can you please test the attached patch?  It implements a very slimmed down
> > > version of kvm_mmu_zap_all() to zap only shadow pages that can hold sptes
> > > pointing at the memslot being removed, which was the original intent of
> > > kvm_mmu_invalidate_zap_pages_in_memslot().  I apologize in advance if it
> > > crashes the host.  I'm hopeful it's correct, but given how broken the
> > > previous version was, I'm not exactly confident.  
> > 
> > It doesn't crash the host, but the guest is not happy, failing to boot
> > the desktop in one case and triggering errors in the guest w/o even
> > running test programs in another case.  Seems like it might be worse
> > than previous.  Thanks,  
> 
> Hrm, I'm back to being completely flummoxed.
> 
> Would you be able to generate a trace of all events/kvmmmu, using the
> latest patch?  I'd like to rule out a stupid code bug if it's not too
> much trouble.

I tried to simplify the patch, making it closer to zap_all, so I
removed the max_level calculation and exclusion based on s->role.level,
as well as the gfn range filtering.  For good measure I even removed
the sp->root_count test, so any sp not marked invalid is zapped.  This
works, and I can also add back the sp->root_count test and things
remain working.

>From there I added back the gfn range test, but I left out the gfn_mask
because I'm not doing the level filtering and I think(?) this is just
another optimization.  So essentially I only add:

	if (sp->gfn < slot->base_gfn ||
            sp->gfn > (slot->base_gfn + slot->npages - 1))
		continue;

Not only does this not work, the host will sometimes oops:

[  808.541168] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  808.555065] #PF: supervisor read access in kernel mode
[  808.565326] #PF: error_code(0x0000) - not-present page
[  808.575588] PGD 0 P4D 0 
[  808.580649] Oops: 0000 [#1] SMP PTI
[  808.587617] CPU: 3 PID: 1965 Comm: CPU 0/KVM Not tainted 5.3.0-rc4+ #4
[  808.600652] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[  808.618907] RIP: 0010:gfn_to_rmap+0xd9/0x120 [kvm]
[  808.628472] Code: c7 48 8d 0c 80 48 8d 04 48 4d 8d 14 c0 49 8b 02 48 39 c6 72 15 49 03 42 08 48 39 c6 73 0c 41 89 b9 08 b4 00 00 49 8b 3a eb 0b <48> 8b 3c 25 00 00 00 00 45 31 d2 0f b6 42 24 83 e0 0f 83 e8 01 8d
[  808.665945] RSP: 0018:ffffa888009a3b20 EFLAGS: 00010202
[  808.676381] RAX: 00000000000c1040 RBX: ffffa888007d5000 RCX: 0000000000000014
[  808.690628] RDX: ffff8eadd0708260 RSI: 00000000000c1080 RDI: 0000000000000004
[  808.704877] RBP: ffff8eadc3d11400 R08: ffff8ead97cf0008 R09: ffff8ead97cf0000
[  808.719124] R10: ffff8ead97cf0168 R11: 0000000000000004 R12: ffff8eadd0708260
[  808.733374] R13: ffffa888007d5000 R14: 0000000000000000 R15: 0000000000000004
[  808.747620] FS:  00007f28dab7c700(0000) GS:ffff8eb19f4c0000(0000) knlGS:0000000000000000
[  808.763776] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  808.775249] CR2: 0000000000000000 CR3: 000000003f508006 CR4: 00000000001626e0
[  808.789499] Call Trace:
[  808.794399]  drop_spte+0x77/0xa0 [kvm]
[  808.801885]  mmu_page_zap_pte+0xac/0xe0 [kvm]
[  808.810587]  __kvm_mmu_prepare_zap_page+0x69/0x350 [kvm]
[  808.821196]  kvm_mmu_invalidate_zap_pages_in_memslot+0x87/0xf0 [kvm]
[  808.833881]  kvm_page_track_flush_slot+0x55/0x80 [kvm]
[  808.844140]  __kvm_set_memory_region+0x821/0xaa0 [kvm]
[  808.854402]  kvm_set_memory_region+0x26/0x40 [kvm]
[  808.863971]  kvm_vm_ioctl+0x59a/0x940 [kvm]
[  808.872318]  ? pagevec_lru_move_fn+0xb8/0xd0
[  808.880846]  ? __seccomp_filter+0x7a/0x680
[  808.889028]  do_vfs_ioctl+0xa4/0x630
[  808.896168]  ? security_file_ioctl+0x32/0x50
[  808.904695]  ksys_ioctl+0x60/0x90
[  808.911316]  __x64_sys_ioctl+0x16/0x20
[  808.918807]  do_syscall_64+0x5f/0x1a0
[  808.926121]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  808.936209] RIP: 0033:0x7f28ebf2b0fb

Does this suggests something is still fundamentally wrong with the
premise of this change or have I done something stupid?  Thanks,

Alex



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux