On Thursday 12 January 2017 02:35 PM, Balbir Singh wrote: > On Mon, Jan 09, 2017 at 05:10:45PM +0530, Aravinda Prasad wrote: [ . . .] >> The reasons for this approach is (i) it is not possible >> to distinguish whether the exception occurred in the >> guest or the host from the pt_regs passed on the >> machine_check_exception(). Hence machine_check_exception() >> calls panic, instead of passing on the exception to >> the guest, if the machine check exception is not >> recoverable. (ii) the approach introduced in this >> patch gives opportunity to the host kernel to perform >> actions in virtual mode before passing on the exception >> to the guest. This approach does not require complex >> tweaks to machine_check_fwnmi and friends. > > It would be good to qualify the different types of MCE > and what action we expect across hypervisor and guest. The hypervisor performs actions depending on the type of MCE (SLB multihit, UEs, etc). If the hypervisor is unable to recover from the MCE and if the address in error belongs to the guest, then this patch set forwards the error to the guest kernel for handling. The main goal of this patch set is to pass on the unrecoverable MCE errors in the guest address space to the guest kernel, instead of crashing the hypervisor. The action taken by the hypervisor and the guest kernel upon MCE remains unchanged. [ . . . ] > > Shouldn't the host take action for example poison bad pages? > We want to give the guest kernel a chance to recover the clean part of the page before poisoning. As in case of an UE only few bytes of a page are affected. Hence we don't immediately poison the bad pages in the host. It is expected that the guest kernel performs the poisoning of the bad pages after performing recovery action. This prevents the guest from reusing the bad page. However, the missing part is to communicate back to the host when guest is done with the recovery. This is mainly to prevent reuse of bad pages by the host when the guest shutdowns/reboots/crashes/migrates. We are planning to address this part as a separate patch set. Regards, Aravinda >> if (opal_recover_mce(regs, &evt)) >> return 1; >> >> > > Balbir Singh > -- Regards, Aravinda -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html