Re: [PATCH v22 8/9] target-arm: kvm64: handle SIGBUS signal from kernel or KVM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 17 Jan 2020 at 10:05, gengdongjiu <gengdongjiu@xxxxxxxxxx> wrote:
>
> On 2020/1/17 0:28, Peter Maydell wrote:
> > On Wed, 8 Jan 2020 at 11:33, Dongjiu Geng <gengdongjiu@xxxxxxxxxx> wrote:
> >
> >> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> >> +{
> >> +    ram_addr_t ram_addr;
> >> +    hwaddr paddr;
> >> +
> >> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
> >> +
> >> +    if (acpi_enabled && addr &&
> >> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
> >> +        ram_addr = qemu_ram_addr_from_host(addr);
> >> +        if (ram_addr != RAM_ADDR_INVALID &&
> >> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
> >> +            kvm_hwpoison_page_add(ram_addr);
> >> +            /*
> >> +             * Asynchronous signal will be masked by main thread, so
> >> +             * only handle synchronous signal.
> >> +             */
> >
> > I don't understand this comment. (I think we've had discussions
> > about it before, but it's still not clear to me.)
> >
> > This function (kvm_arch_on_sigbus_vcpu()) will be called in two contexts:
> >
> > (1) in the vcpu thread:
> >   * the real SIGBUS handler sigbus_handler() sets a flag and arranges
> >     for an immediate vcpu exit
> >   * the vcpu thread reads the flag on exit from KVM_RUN and
> >     calls kvm_arch_on_sigbus_vcpu() directly
> >   * the error could be MCEERR_AR or MCEERR_AOFor the vcpu thread, the error can be MCEERR_AR or MCEERR_AO,
> but kernel/KVM usually uses MCEERR_AR(action required) instead of MCEERR_AO, because it needs do action immediately. For MCEERR_AO error, the action is optional and the error can be ignored.
> At least I do not find Linux kernel/KVM deliver MCEERR_AO in the vcpu threads.
>
> > (2) MCE errors on other threads:
> >   * here SIGBUS is blocked, so MCEERR_AR (action-required)
> >     errors will cause the kernel to just kill the QEMU process
> >   * MCEERR_AO errors will be handled via the iothread's use
> >     of signalfd(), so kvm_on_sigbus() will get called from
> >     the main thread, and it will call kvm_arch_on_sigbus_vcpu()
> >   * in this case the passed in CPUState will (arbitrarily) be that
> >     for the first vCPU
>
> For the MCE errors on other threads, it can only handle MCEERR_AO. If it is MCEERR_AR, the QEMU will assert and exit[2].
>
> Case1: Other APP indeed can send MCEERR_AO to QEMU, QEMU handle it via the iothread's use of signalfd() through above path.
> Case2: But if the MCEERR_AO is delivered by kernel, I see QEMU ignore it because SIGBUS is masked in main thread[3], for this case, I do not see QEMU handle it via signalfd() for MCEERR_AO errors from my test.

SIGBUS is blocked in the main thread because we use signalfd().
The function sigfd_handler() should be called and it will then
manually invoke the correct function for the signal.

> For Case1,I think we should not let guest know it, because it is not triggered by guest. only other APP send SIGBUS to tell QEMU do somethings.

I don't understand what you mean here by "other app" or
"guest" triggering of MCEERR. I thought that an MCEERR meant
"the hardware has detected that there is a problem with the
RAM". If there's a problem with the RAM and it's the RAM that's
being used as guest RAM, we need to tell the guest, surely ?

> For Case2,it does not call call kvm_arch_on_sigbus_vcpu().

It should do. The code you quote calls that function
for that case:

> [1]:
> /* Called synchronously (via signalfd) in main thread.  */
> int kvm_on_sigbus(int code, void *addr)
> {
> #ifdef KVM_HAVE_MCE_INJECTION
>     /* Action required MCE kills the process if SIGBUS is blocked.  Because
>      * that's what happens in the I/O thread, where we handle MCE via signalfd,
>      * we can only get action optional here.
>      */
> [2]: assert(code != BUS_MCEERR_AR);
>     kvm_arch_on_sigbus_vcpu(first_cpu, code, addr);
>     return 0;
> #else
>     return 1;
> #endif
> }


> Above all, from my test, for MCEERR_AO error which is triggered by guest, it not call
kvm_arch_on_sigbus_vcpu().

I'm not sure what you mean by "triggered by guest". I assume that
exactly what kind of errors the kernel can report and when will
depend to some extent on the underlying hardware/firmware
implementation of reporting of memory errors, but in principle
the ABI allows the kernel to send SIGBUS_(BUS_MCEERR_AO) to the
main thread, the signal should be handled by signalfd, our code
for working with multiple fds should mean that the main thread
calls sigfd_handler() to deal with reading bytes from the signalfd
fd, and that function should then call sigbus_handler(), which
calls kvm_on_sigbus(), which calls kvm_arch_on_sigbus_vcpu().
If something in that code path is not working then we need to
find out what it is.

thanks
-- PMM




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux