On Mon, 13 May 2019 at 13:46, Dongjiu Geng <gengdongjiu@xxxxxxxxxx> wrote: > > Add SIGBUS signal handler. In this handler, it checks the SIGBUS type, > translates the host VA delivered by host to guest PA, then fill this PA > to guest APEI GHES memory, then notify guest according to the SIGBUS type. > > If guest accesses the poisoned memory, it generates Synchronous External > Abort(SEA). Then host kernel gets an APEI notification and call memory_failure() > to unmapped the affected page for the guest's stage 2, finally return > to guest. > > Guest continues to access PG_hwpoison page, it will trap to KVM as stage2 fault, > then a SIGBUS_MCEERR_AR synchronous signal is delivered to Qemu, Qemu record this > error address into guest APEI GHES memory and notify guest using > Synchronous-External-Abort(SEA). > > Suggested-by: James Morse <james.morse@xxxxxxx> > Signed-off-by: Dongjiu Geng <gengdongjiu@xxxxxxxxxx> > +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) > +{ > + ARMCPU *cpu = ARM_CPU(c); > + CPUARMState *env = &cpu->env; > + ram_addr_t ram_addr; > + hwaddr paddr; > + > + assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO); > + > + if (addr) { > + ram_addr = qemu_ram_addr_from_host(addr); > + if (ram_addr != RAM_ADDR_INVALID && > + kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) { > + kvm_hwpoison_page_add(ram_addr); > + /* Asynchronous signal will be masked by main thread, so > + * only handle synchronous signal. > + */ > + if (code == BUS_MCEERR_AR) { > + kvm_cpu_synchronize_state(c); > + if (GHES_CPER_FAIL != ghes_record_errors(ACPI_HEST_NOTIFY_SEA, paddr)) { > + kvm_inject_arm_sea(c); > + } else { > + fprintf(stderr, "failed to record the error\n"); > + } > + } > + return; > + } > + fprintf(stderr, "Hardware memory error for memory used by " > + "QEMU itself instead of guest system!\n"); > + } > + > + if (code == BUS_MCEERR_AR) { > + fprintf(stderr, "Hardware memory error!\n"); > + exit(1); > + } > +} This code appears to still be unconditionally trying to notify the guest of the error via the ACPI tables without checking whether those ACPI tables even exist. I told you about this in a previous round of review :-( thanks -- PMM