On 23.02.21 10:26, Ross Lagerwall wrote:
On 2021-02-19 15:40, Juergen Gross wrote:An event channel should be kept masked when an eoi is pending for it. When being migrated to another cpu it might be unmasked, though. In order to avoid this keep three different flags for each event channel to be able to distinguish "normal" masking/unmasking from eoi related masking/unmasking and temporary masking. The event channel should only be able to generate an interrupt if all flags are cleared. Cc: stable@xxxxxxxxxxxxxxx Fixes: 54c9de89895e0a36047 ("xen/events: add a new late EOI evtchn framework") Reported-by: Julien Grall <julien@xxxxxxx> Signed-off-by: Juergen Gross <jgross@xxxxxxxx>I tested this patch series backported to a 4.19 kernel and found that when doing a reboot loop of Windows with PV drivers, occasionally it will end up in a state with some event channels pending and masked in dom0 which breaks networking in the guest. The issue seems to have been introduced with this patch, though at first glance it appears correct. I haven't yet looked into why it is happening. Have you seen anything like this with this patch?
I have found the issue. lateeoi_mask_ack_dynirq() must not set the "eoi" mask reason flag, as this callback will be called when the handler will not be called later, so there will never be a call of xen_irq_lateeoi() to unmask the event channel again. Juergen
Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature