On Tue, Apr 23, 2024 at 02:10:13PM -0700, Guenter Roeck wrote: > Hi Alan, > > On 4/23/24 10:30, Alan Stern wrote: > > On Tue, Apr 23, 2024 at 10:04:17AM -0700, Guenter Roeck wrote: > > > Hi, > > > > > > when testing usb-ohci > > > > What is usb-ohci? Do you mean ohci-hcd? > > > > > with qemu's pci-ohci emulation, I keep getting > > > random usb interface timeouts. Sometimes the usb_hub_wq times out. > > ... > > > > > Sometimes there is an i/o scheduling timeout such as > > ... > > > > > This is not a new problem; I have seen it forever. Recently I spent some > > > time trying to understand the problem and found that the linux driver does > > > not always handle all ohci interupts. > > > > Please be more specific: _Which_ interrupts aren't being handled? That > > is, which flags remain set in the intrstatus register when the handler > > returns? > > > Sorry, I didn't think it was relevant. > > Here is the result of repeating the test with the linux patch several times > and passing the log output through "sort | uniq". > > 1 pending 0x2 mask 0x8000005a > 9 pending 0x4 mask 0x8000001e > 208 pending 0x4 mask 0x8000005e > > So it is almost always OHCI_INTR_SF and rarely OHCI_INTR_WDH. I see. Both of those flags can legitimately get turned on while the interrupt routine is running, and under heavy load this could happen fairly frequently. The SF (Start of Frame) interrupt occurs once every millisecond, and the WDH (Writeback Done Head) interrupt occurs more or less when a transfer completes. > For reference, this is repeatedly running > > CMDLINE="panic=-1 kunit.stats_enabled=2 kunit.filter=speed>slow root=/dev/sda rootwait console=ttyS0,115200 earlycon=uart8250,mmio,0x10000000,115200" > qemu-system-riscv32 -M virt -m 512M \ > -no-reboot -bios default -kernel arch/riscv/boot/Image -snapshot \ > -device virtio-net-device,netdev=net0 -netdev user,id=net0 -usb \ > -device pci-ohci,id=ohci -device usb-storage,bus=ohci.0,drive=d0 \ > -drive file=rootfs.ext2,if=none,id=d0,format=raw \ > -append "${CMDLINE}" \ > -nographic -monitor none \ > --trace 'usb*' -D "/tmp/usb-ohci.$$.${sequence}.trace.log" > > Tracing isn't really necessary but it changes timing enough that > the problem is more likely to be seen if it is active. The problem > is seen with various emulations; I just picked one of them. > > > > Since the interrupt is shared and > > > thus level triggered, that means that interrupts are still pending when > > > ohci_irq() exits. The interrupt core in Linux does not re-enter the > > > interrupt handler, presumably because it is marked as shared interrupt > > > and returns that the interrupt has been handled. > > > > Isn't that behavior mistaken? A level-triggered IRQ that remains set when > > it is re-enabled (when the various shared handlers return) should cause > > another interrupt to occur right away. > > > > Edged-triggered interrupts would be a different story, of course. > > > > Maybe I got it wrong; I thought that shared interrupts would have to be > level triggered. But then you are correct: one would think that level > triggered interrupts would keep executing interrupt handlers until the > interrupt is completely handled. I guess that means that I don't really > understand what is happening. Sorry for jumping to conclusions. To avoid the overhead of repeated interrupts, it would be best to check the interrupt status at the end of the routine and restart if any of the enabled bits are set, as in your first patch. If you would like to clean it up (get rid of the debugging stuff) and submit it, I'll review it. > > > I found two possible fixes for the problem. One essentially mirrors the > > > code from ehci_irq(), the other adds a (bad) kludge into qemu. Both "fix" > > > or work around the problem. > > > > > > Question is: What is actually wrong ? Something in the generic interrupt > > > handling code in Linux, something in the Linux usb-ohci driver, or > > > something in qemu ? Any idea how a proper fix might look like ? > > > > To answer these questions we need more information. > > > What else would you need ? Nothing else really, just the remaining interrupt flag bits. Alan Stern