On 02/22/2017 05:15 AM, Paolo Bonzini wrote:
On 22/02/2017 04:08, Chris Friesen wrote:
On 02/19/2017 10:38 PM, Han, Huaitong wrote:
Hi, Gaohuai
I tried to debug the problem, and I found the indirect cause may be that
the rmap value is not cleared when KVM mmu page is freed. I have read
code without the root cause. Can you stable reproduce the the issue?
Many guesses need to be verified.
In both cases it seems to have been triggered by repeatedly
live-migrating a KVM virtual machine between two hypervisors with
Broadwell CPUs running the latest CentOS 7.
It's a race of some sort, it doesn't happen every time.
Can you reproduce it with kernel 4.8+? I'm suspecting commmit
4e59516a12a6 ("kvm: vmx: ensure VMCS is current while enabling PML",
2016-07-14) to be the fix.
I can't easily try with a newer kernel, the software package we're using has
kernel patches that would have to be ported.
I'm at a conference, don't really have time to set up a pair of test machines
from scratch with a custom kernel.
Chris