Re: Problems (timeouts) when testing usb-ohci with qemu

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/24/24 08:23, Guenter Roeck wrote:
On 4/24/24 04:16, Gerd Hoffmann wrote:
qemu hack:

  hw/usb/hcd-ohci.c | 11 +++++++++++
  hw/usb/hcd-ohci.h |  1 +
  2 files changed, 12 insertions(+)

diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
index fc8fc91a1d..99e52ad13a 100644
--- a/hw/usb/hcd-ohci.c
+++ b/hw/usb/hcd-ohci.c
@@ -267,6 +267,10 @@ static inline void ohci_intr_update(OHCIState *ohci)
          (ohci->intr_status & ohci->intr))
          level = 1;
+    if (level && ohci->level)
+        qemu_set_irq(ohci->irq, 0);
+
+    ohci->level = level;
      qemu_set_irq(ohci->irq, level);
  }
diff --git a/hw/usb/hcd-ohci.h b/hw/usb/hcd-ohci.h
index e1827227ac..6f82e72bd9 100644
--- a/hw/usb/hcd-ohci.h
+++ b/hw/usb/hcd-ohci.h
@@ -52,6 +52,7 @@ struct OHCIState {
      uint32_t ctl, status;
      uint32_t intr_status;
      uint32_t intr;
+    int level;
      /* memory pointer partition */
      uint32_t hcca;

Phew.  Disclaimer: Havn't looked at the ohci emulation code for years.

It should not be needed to store the interrupt level that way.  It is
possible to calculate what the interrupt level should be, based on the
interrupt status register and the interrupt mask register, and the code
above seems to do exactly that (the "ohci->intr_status & ohci->intr"
bit).


You are correct. For the purpose of this kludge a simpler
+    qemu_set_irq(ohci->irq, 0);
     qemu_set_irq(ohci->irq, level);

would have been sufficient. My original code added tracing,
where this generated a lot of noise. I didn't completely simplify
the kludge. Sorry for that and for any confusion it may have caused.

ohci_intr_update() must be called if one of these two registers changes,
i.e. if the guest changes the mask, if the guest acks an IRQ by clearing
an status bit, if the device raises an IRQ by setting an status bit.
Might be the ohci emulation has a bug here.

Another possible trouble spot is that the timing behavior is different
on virtual vs. physical hardware.  Specifically with the emulated
hardware some actions appear to complete instantly (when the vmexit to
handle the mmio register write returns it's finished already), which
will never complete that quickly on physical hardware.  So drivers can
have race conditions which only trigger on virtual hardware.  The error
pattern you are describing sounds like this could be the case here.


I think the underlying problem is that both the qemu emulation and
the Linux kernel driver expect that the interrupt is level triggered.
It looks like some entity in between handles the interrupts as edge
triggered. This makes the kludge necessary: All it does is to generate
an artificial interrupt edge.

This can be worked around in the Linux interrupt handler by checking
if another interrupt arrived while the original interrupt was handled.
This will ensure that all interrupts are handled and cleared when the
handler exits, and that a later arriving interrupt will generate the
necessary edge and thus another interrupt. That doesn't fix the
edge<->level triggered interrupt confusion (if that is indeed the root
cause of the problem), but it does address its consequences.

If anyone has an idea how to find out where the interrupt confusion
happens, please let me know, and I'll be happy to do some more testing.
I am quite curious myself, and it would make sense to solve the problem
at its root.

Update:

I found upstream commit 0b60557230ad ("usb: ehci: Prevent missed ehci
interrupts with edge-triggered MSI") which I think explains the problem.

Guenter





[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux