On 08/27/2012 10:06 AM, Zhang Yanfei wrote: > Hello Avi, Hi, > > About this VMCSINFO patch, we really need this functionality in our development. > And YOSHIDA Masanori(masanori.yoshida.tv@xxxxxxxxxxx), the developer from Hitachi, > has said they need this too. So could you please tell us why the patch is unacceptable? > You dislike the whole export-VMCSINFO-thing in all, or you just dislike the way > we implement the path? Finally do you have any suggestion about all this? > > Below is why we need this patch and how we will use this patch in our development. > > We once came to an abnormal situation: a host scheduler bug caused guest machine's > vcpu stopped for a long time and then led to heartbeat stop (host is still running). > > We want to have an efficient way to make the bug analysis when we come to the similar > situations where guest machine doesn't work well due to something of host machine's. > Actually, these situations have happened many times, in particular, under development. Is there a different way to solve this issue? Panicing both guest and host seems to be a heavy hammer. Alternatives include passing guest traces to the host, or extending sysrq-t to dump guest information. > > So here comes the requirement: > If we want to find the root cause, we should debug both host machine's and guest > machine's sides. But first we should get both host machine's crash dump and guest > machine's crash dump and they must be dumped at the same time when the abnormal > situation remains. So the only way to do this is to panic the host with the abnormal > guest running on it and then the guest's image is contained in host's crash dump. > > Logically, retrieving guest's crash dump from the host's crash dump is the very > important step to accomplish our goal. Unfortunately, in kvm implementation, some > registers' values of the guest are hidden in vmcs, and vmcs internal is hidden by > Intel. If we could not retrieve these registers from the vmcs, the guest crash dump > we make is incomplete, and some key information is lost when we analyse the guest > crash dump. > > So we make this patch to export the vmcs internal. With the patch applied, we > could write registers' values stored in vmcs into guest's crash dump. And that's > what we want. > > If a bug was found on customer's environment, we have two ways to avoid > affecting other guest machines running on the same host. First, we could do bug > analysis on another environment to reproduce the buggy situation; Second, we > could migrate other guest machines to other hosts. > > After the abnormal situation is reproduced, we panic the host *manually*. > Then we could use userland tools to get guest machine's crash dump from host machine's > with the feature provided by this patch. Finally we could analyse them separately > to find which side causes the problem. I'm not happy with reverse engineering the vmcs encoding. First, it's just begging to be broken with some future processor revision; second, the processor may cache some fields on chip, which renders the dump data inaccurate. I haven't looked at the patch itself recently, but I believe it can be made non-intrusive (if it isn't already), so no objections on that score. But I would prefer a solution that doesn't violate the spec. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html