Re: [PATCH V2] ACPI / APEI: restore interrupt before panic in sdei flow

乱石 <zhangliguang@xxxxxxxxxxxxxxxxx> · Thu, 14 Oct 2021 22:18:54 +0800

Hi,

在 2021/10/14 1:44, James Morse 写道:
Hello!

On 12/10/2021 15:29, Liguang Zhang wrote:
When hest acpi table configure Hardware Error Notification type as
Software Delegated Exception(0x0B) for RAS event, OS RAS interacts with
ATF by SDEI mechanism. On the firmware first system, OS was notified by
ATF sdei call.

The calling flow like as below when fatal RAS error happens:

ATF notify OS flow:
   sdei_dispatch_event()
     ehf_activate_priority()
       call sdei callback  // callback registered by OS
     ehf_deactivate_priority()

OS sdei callback:
   sdei_asm_handler()
     __sdei_handler()
       _sdei_handler()
         sdei_event_handler()
           ghes_sdei_critical_callback()
             ghes_in_nmi_queue_one_entry()
               /* if RAS error is fatal */
               __ghes_panic()
                 panic()

If fatal RAS error occured, panic was called in sdei_asm_handle()
without ehf_deactivate_priority executed, which lead interrupt masked.
So far the story is:
Firmware generated and SDEI event (a kind of software NMI) because of a firmware
interrupt, but it hasn't completely handled the interrupt.

If interrupt masked, system would be halted in kdump flow like this:

arm-smmu-v3 arm-smmu-v3.3.auto: allocated 65536 entries for cmdq
arm-smmu-v3 arm-smmu-v3.3.auto: allocated 32768 entries for evtq
arm-smmu-v3 arm-smmu-v3.3.auto: allocated 65536 entries for priq
arm-smmu-v3 arm-smmu-v3.3.auto: SMMU currently enabled! Resetting...
How and why do firmware interrupts affect the IOMMU?

It sounds like you are sharing something with firmware that you shouldn't.

After debug, we found accurate halted position is:
arm_smmu_device_probe()
   arm_smmu_device_reset()
     arm_smmu_device_disable()
       arm_smmu_write_reg_sync()
         readl_relaxed_poll_timeout()
           readx_poll_timeout()
             read_poll_timeout()
               usleep_range() // hrtimer is never waked.

So interrupt should be restored before panic otherwise kdump will trigger
error.
Why can't firmware finish with the interrupt before injecting the SDEI event?
If you need it to not happen a second time while the handler runs, you can always disable it.

The text in the spec about the interaction of complete and physical interrupts is for
bound interrupts. Linux doesn't support these. It isn't possible for linux to know whether
firmware tied any other kind of event to a physical interrupt or not.

In the process of sdei, a SDEI_EVENT_COMPLETE_AND_RESUME call
should be called before panic for a completed run of ehf_deactivate_priority().
SDEI_EVENT_COMPLETE_AND_RESUME is a complete, it tells firmware to restore the execution
state from before the event. You get almost get away with x17-x30 being corrupted as
panic() won't return - but the stack trace produced will be corrupt. If the original
exception was from user-space, SP_EL0 will have been restored to be the user value. The
kernel uses this for 'current'.

The way this is supposed to work is the die-ing kernel calls SDEI_PE_MASK while it does
the kdump reboot. Once the kdump kernel has started, the SDEI_PRIVATE_RESET and
SDEI_SHARED_RESET calls should fix anything left over in firmware.

Could you debug why firmware interrupts being active prevent the SMMU from being reset. As
far as I can tell, those should be totally independent.

If ehf_deactivate_priority() was not executed, pmr_el1 register was not 
resumed to >0x80, which leads

non-secure interrupts masked. arm_smmu_device_probe() finally called 
usleep_range() which based on

hrtimer. Because non-secure timer interrupts was masked, usleep_range 
would not reponse.

Thanks.

Liguang

Thanks,

James