Export offsets of VMCS fields as note information for kdump

Zhang Yanfei <zhangyanfei@xxxxxxxxxxxxxx> · Mon, 27 Aug 2012 15:06:23 +0800

Hello Avi,

About this VMCSINFO patch, we really need this functionality in our development.
And YOSHIDA Masanori(masanori.yoshida.tv@xxxxxxxxxxx), the developer from Hitachi,
has said they need this too. So could you please tell us why the patch is unacceptable?
You dislike the whole export-VMCSINFO-thing in all, or you just dislike the way
we implement the path? Finally do you have any suggestion about all this?

Below is why we need this patch and how we will use this patch in our development.

We once came to an abnormal situation: a host scheduler bug caused guest machine's
vcpu stopped for a long time and then led to heartbeat stop (host is still running).

We want to have an efficient way to make the bug analysis when we come to the similar
situations where guest machine doesn't work well due to something of host machine's.
Actually, these situations have happened many times, in particular, under development.

So here comes the requirement:
If we want to find the root cause, we should debug both host machine's and guest
machine's sides. But first we should get both host machine's crash dump and guest
machine's crash dump and they must be dumped at the same time when the abnormal
situation remains. So the only way to do this is to panic the host with the abnormal
guest running on it and then the guest's image is contained in host's crash dump.

Logically, retrieving guest's crash dump from the host's crash dump is the very
important step to accomplish our goal. Unfortunately, in kvm implementation, some
registers' values of the guest are hidden in vmcs, and vmcs internal is hidden by
Intel. If we could not retrieve these registers from the vmcs, the guest crash dump
we make is incomplete, and some key information is lost when we analyse the guest
crash dump. 

So we make this patch to export the vmcs internal. With the patch applied, we
could write registers' values stored in vmcs into guest's crash dump. And that's
what we want.

If a bug was found on customer's environment, we have two ways to avoid
affecting other guest machines running on the same host. First, we could do bug
analysis on another environment to reproduce the buggy situation; Second, we
could migrate other guest machines to other hosts.

After the abnormal situation is reproduced, we panic the host *manually*.
Then we could use userland tools to get guest machine's crash dump from host machine's
with the feature provided by this patch. Finally we could analyse them separately
to find which side causes the problem.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html