Hi Alan,
On 4/23/24 10:30, Alan Stern wrote:
On Tue, Apr 23, 2024 at 10:04:17AM -0700, Guenter Roeck wrote:
Hi,
when testing usb-ohci
What is usb-ohci? Do you mean ohci-hcd?
with qemu's pci-ohci emulation, I keep getting
random usb interface timeouts. Sometimes the usb_hub_wq times out.
...
Sometimes there is an i/o scheduling timeout such as
...
This is not a new problem; I have seen it forever. Recently I spent some
time trying to understand the problem and found that the linux driver does
not always handle all ohci interupts.
Please be more specific: _Which_ interrupts aren't being handled? That
is, which flags remain set in the intrstatus register when the handler
returns?
Sorry, I didn't think it was relevant.
Here is the result of repeating the test with the linux patch several times
and passing the log output through "sort | uniq".
1 pending 0x2 mask 0x8000005a
9 pending 0x4 mask 0x8000001e
208 pending 0x4 mask 0x8000005e
So it is almost always OHCI_INTR_SF and rarely OHCI_INTR_WDH.
For reference, this is repeatedly running
CMDLINE="panic=-1 kunit.stats_enabled=2 kunit.filter=speed>slow root=/dev/sda rootwait console=ttyS0,115200 earlycon=uart8250,mmio,0x10000000,115200"
qemu-system-riscv32 -M virt -m 512M \
-no-reboot -bios default -kernel arch/riscv/boot/Image -snapshot \
-device virtio-net-device,netdev=net0 -netdev user,id=net0 -usb \
-device pci-ohci,id=ohci -device usb-storage,bus=ohci.0,drive=d0 \
-drive file=rootfs.ext2,if=none,id=d0,format=raw \
-append "${CMDLINE}" \
-nographic -monitor none \
--trace 'usb*' -D "/tmp/usb-ohci.$$.${sequence}.trace.log"
Tracing isn't really necessary but it changes timing enough that
the problem is more likely to be seen if it is active. The problem
is seen with various emulations; I just picked one of them.
Since the interrupt is shared and
thus level triggered, that means that interrupts are still pending when
ohci_irq() exits. The interrupt core in Linux does not re-enter the
interrupt handler, presumably because it is marked as shared interrupt
and returns that the interrupt has been handled.
Isn't that behavior mistaken? A level-triggered IRQ that remains set when
it is re-enabled (when the various shared handlers return) should cause
another interrupt to occur right away.
Edged-triggered interrupts would be a different story, of course.
Maybe I got it wrong; I thought that shared interrupts would have to be
level triggered. But then you are correct: one would think that level
triggered interrupts would keep executing interrupt handlers until the
interrupt is completely handled. I guess that means that I don't really
understand what is happening. Sorry for jumping to conclusions.
I found two possible fixes for the problem. One essentially mirrors the
code from ehci_irq(), the other adds a (bad) kludge into qemu. Both "fix"
or work around the problem.
Question is: What is actually wrong ? Something in the generic interrupt
handling code in Linux, something in the Linux usb-ohci driver, or
something in qemu ? Any idea how a proper fix might look like ?
To answer these questions we need more information.
What else would you need ?
Thanks,
Guenter