Marc Zyngier <marc.zyngier@xxxxxxx> writes: > On 27/03/17 14:31, Punit Agrawal wrote: >> Christoffer Dall <cdall@xxxxxxxxxx> writes: >> >>> On Mon, Mar 27, 2017 at 01:00:56PM +0100, James Morse wrote: >>>> Hi guys, >>>> >>>> On 27/03/17 12:20, Punit Agrawal wrote: >>>>> Christoffer Dall <cdall@xxxxxxxxxx> writes: >>>>>> On Wed, Mar 15, 2017 at 04:07:27PM +0000, James Morse wrote: >>>>>>> Once we enable ARCH_SUPPORTS_MEMORY_FAILURE on arm64[0], notifications for >>>>>>> broken memory can call memory_failure() in mm/memory-failure.c to deliver >>>>>>> SIGBUS to any user space process using the page, and notify all the >>>>>>> in-kernel users. >>>>>>> >>>>>>> If the page corresponded with guest memory, KVM will unmap this page >>>>>>> from its stage2 page tables. The user space process that allocated >>>>>>> this memory may have never touched this page in which case it may not >>>>>>> be mapped meaning SIGBUS won't be delivered. >>>>>>> >>>>>>> When this happens KVM discovers pfn == KVM_PFN_ERR_HWPOISON when it >>>>>>> comes to process the stage2 fault. >>>>>>> >>>>>>> Do as x86 does, and deliver the SIGBUS when we discover >>>>>>> KVM_PFN_ERR_HWPOISON. Use the stage2 mapping size as the si_addr_lsb >>>>>>> as this matches the user space mapping size. >>>> >>>>>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >>>>>>> index 962616fd4ddd..9d1aa294e88f 100644 >>>>>>> --- a/arch/arm/kvm/mmu.c >>>>>>> +++ b/arch/arm/kvm/mmu.c >>>>>>> @@ -20,8 +20,10 @@ >>>>>>> #include <linux/kvm_host.h> >>>>>>> #include <linux/io.h> >>>>>>> #include <linux/hugetlb.h> >>>>>>> +#include <linux/sched/signal.h> >>>>>>> #include <trace/events/kvm.h> >>>>>>> #include <asm/pgalloc.h> >>>>>>> +#include <asm/siginfo.h> >>>>>>> #include <asm/cacheflush.h> >>>>>>> #include <asm/kvm_arm.h> >>>>>>> #include <asm/kvm_mmu.h> >>>>>>> @@ -1237,6 +1239,23 @@ static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn, >>>>>>> __coherent_cache_guest_page(vcpu, pfn, size); >>>>>>> } >>>>>>> >>>>>>> +static void kvm_send_hwpoison_signal(unsigned long address, bool hugetlb) >>>>>>> +{ >>>>>>> + siginfo_t info; >>>>>>> + >>>>>>> + info.si_signo = SIGBUS; >>>>>>> + info.si_errno = 0; >>>>>>> + info.si_code = BUS_MCEERR_AR; >>>>>>> + info.si_addr = (void __user *)address; >>>>>>> + >>>>>>> + if (hugetlb) >>>>>>> + info.si_addr_lsb = PMD_SHIFT; >>>>>>> + else >>>>>>> + info.si_addr_lsb = PAGE_SHIFT; >>>>>>> + >>>>>>> + send_sig_info(SIGBUS, &info, current); >>>>>>> +} >>>>>>> + >>>>>>> static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, >>>>>>> struct kvm_memory_slot *memslot, unsigned long hva, >>>>>>> unsigned long fault_status) >>>>>>> @@ -1306,6 +1325,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, >>>>>>> smp_rmb(); >>>>>>> >>>>>>> pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable); >>>>>>> + if (pfn == KVM_PFN_ERR_HWPOISON) { >>>>>>> + kvm_send_hwpoison_signal(hva, hugetlb); >>>>>> >>>>>> The way this is called means that we'll only notify userspace of a huge >>>>>> mapping if userspace is mapping hugetlbfs, and not because the stage2 >>>>>> mapping may or may not have used transparent huge pages when the error >>>>>> was discovered. Is this the desired semantics? >>>> >>>> No, >>>> >>>> >>>>> I think so. >>>>> >>>>> AFAIUI, transparent hugepages are split before being poisoned while all >>>>> the underlying pages of a hugepage are poisoned together, i.e., no >>>>> splitting. >>>> >>>> In which case I need to look into this some more! >>>> >>>> My thinking was we should report the size that was knocked out of the stage2 to >>>> avoid the guest repeatedly faulting until it has touched every guest-page-size >>>> in the stage2 hole. >>> >>> By signaling something at the fault path, I think it's going to be very >>> hard to backtrack how the stage 2 page tables looked like when faults >>> started happening, because I think these are completely decoupled events >>> (the mmu notifier and the later fault). >>> >>>> >>>> Reading the code in that kvm/mmu.c it looked like the mapping sizes would always >>>> be the same as those used by userspace. >>> >>> I think the mapping sizes should be the same between userspace and KVM, >>> but the mapping size of a particular page (and associated pages) may >>> vary over time. >> >> Stage 1 and Stage 2 support different hugepage sizes. A larger size >> stage 1 page maps to multiple stage 2 page table entries. For stage 1, >> we support PUD_SIZE, CONT_PMD_SIZE, PMD_SIZE and CONT_PTE_SIZE while >> only PMD_SIZE is supported for Stage 2. > > What is stage-1 doing here? We have no idea about what stage-1 is doing > (not under KVM's control). Or do you mean userspace instead? I mean userspace here. Sorry for the confusion. > > Thanks, > > M. _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm