Hi gengdongjiu, (+CC: Achin) On 21/06/17 08:42, gengdongjiu wrote: > On 2017/5/25 0:32, James Morse wrote: >> +static void kvm_send_hwpoison_signal(unsigned long address, >> + struct vm_area_struct *vma) >> +{ >> + siginfo_t info; >> + >> + info.si_signo = SIGBUS; >> + info.si_errno = 0; >> + info.si_code = BUS_MCEERR_AR; >> + info.si_addr = (void __user *)address; >> + >> + if (is_vm_hugetlb_page(vma)) >> + info.si_addr_lsb = huge_page_shift(hstate_vma(vma)); >> + else >> + info.si_addr_lsb = PAGE_SHIFT; >> + >> + send_sig_info(SIGBUS, &info, current); >> +} >> + >> static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, >> struct kvm_memory_slot *memslot, unsigned long hva, >> unsigned long fault_status) >> @@ -1318,6 +1337,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, >> smp_rmb(); >> >> pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable); >> + if (pfn == KVM_PFN_ERR_HWPOISON) { >> + kvm_send_hwpoison_signal(hva, vma); >> + return 0; >> + } > I heard from our CPU hardware team, when happen HWpoison, CPU hardware does not record the IPA address in the HPFAR_EL2. I think we discussed this before[0], your CPU has a feature called 'hwpoison' that is uses to support RAS. Linux also has a feature called 'hwpoison' [1][2], which handles the offline-ing of memory pages when it receives a notification through APEI. I've tried to call this memory_failure() to avoid this confusion. This patch is to handle stage2 faults when the page was removed from the stage2 mapping by the memory_failure() code. v3 of this patch[3] does a much better job of describing this. (... I don't think your question is related to this patch ...) > Only when the SEA error is related to the page table walk, the HPFAR_EL2 register is updated. > here we got the pfn/gfn from the register HPFAR_EL2, if CPU does not update the HPFAR_EL2 register, > we may can not use this method to get the pfn/gfn. This patch is only concerned with ordinary stage2 faults. Linux doesn't want to give the page to the guest as it has been marked with the PG_HWpoison flag. Instead we trigger memory_failure()'s 'late' notification. To the CPU this will look like a normal stage2 fault. > could you confirm arm's armv8.0/armv8.2 standard CPU also use such design? > if so, we may need to use other method to get the gfn/pfn/hva address. Your question is what happens when a guest accesses a location your CPU has marked with its 'hwpoison'. From the RAS spec[4] I would expect this to be reported as a Synchronous External Abort. For firmware-first error handling this should be taken to EL3. >From here firmware should generate CPER records and then notify the OS via one of the APEI mechanisms. Because a guest was running this notification must reach EL2. KVM can then switch back to the host and invoke the APEI GHES handler, which for this kind of error will probably call memory_failure(). What happens if HPFAR_EL2 isn't set when this kind of error occurs? Provided EL3 can learn the physical address that triggered the exception then for firmware-first this isn't a problem. These errors should be taken to EL3 and then described by CPER records to the OS. The OS should process the CPER records in preference to attempting 'kernel first error handling'. memory_failure() will unmap the affected pages from user space (and KVMs stage2) and deliver signals (if Qemu/kvmtool registered for them via prctl()). KVM won't re-enter the guest if there is a signal pending, from here Qemu/kvmtool get the affected VA which they can use to notify the guest. I hope this helps, James [0] https://lkml.org/lkml/2017/5/12/492 [1] https://lwn.net/Articles/348886/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/vm/hwpoison.txt [3] https://www.spinics.net/lists/arm-kernel/msg589515.html [4] https://static.docs.arm.com/ddi0587/a/RAS%20Extension-release%20candidate_march_29.pdf _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm