Re: [PATCH] usb: hcd: Revert 306c54d0edb6ba94d39877524dddebaad7770cf2: Try MSI interrupts on PCI devices

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> · Tue, 13 Jul 2021 15:15:48 -0400

On Tue, Jul 13, 2021 at 02:50:42PM -0400, Laurence Oberman wrote:
> Customers have been reporting that the I/O is radically being
> slowed down to HPE virtual USB ILO served DVD images during installation.
> 
> Lots of investigation by the Red Hat lab has found that the issue is 
> because MSI edge interrupts do not work properly for these 
> ILO USB devices.
> We start fast and then drop to polling mode and its unusable.
> 
> The issue exists currently upstream on 5.13 as tested by Red Hat, 
> and reverting the mentioned patch corrects this upstream.
> 
> David Jeffery has this explanation:
> 
> The problem with the patch turning on MSI appears to be that the ehci 
> driver (and possibly other usb controller types too) wasn't written to
> support edge-triggered interrupts.
> The ehci_irq routine appears to be written in such a way that it will 
> be racy with multiple interrupt source bits.
> With a level-triggered interrupt, it gets called another time and cleans 
> up other interrupt sources.
> But with MSI edge, the interrupt state staying high results in no 
> new interrupt and ehci has to run based on polling.
> 
> static irqreturn_t ehci_irq (struct usb_hcd *hcd)
> {
> ...
>         status = ehci_readl(ehci, &ehci->regs->status);
> 
>         /* e.g. cardbus physical eject */
>         if (status == ~(u32) 0) {
>                 ehci_dbg (ehci, "device removed\n");
>                 goto dead;
>         }
> 
>         /*
>          * We don't use STS_FLR, but some controllers don't like it to
>          * remain on, so mask it out along with the other status bits.
>          */
>         masked_status = status & (INTR_MASK | STS_FLR);
> 
>         /* Shared IRQ? */
>         if (!masked_status || unlikely(ehci->rh_state == EHCI_RH_HALTED)) {
>                 spin_unlock_irqrestore(&ehci->lock, flags);
>                 return IRQ_NONE;
>         }
> 
>         /* clear (just) interrupts */
>         ehci_writel(ehci, masked_status, &ehci->regs->status);
> ...
> 
> ehci_irq() reads the interrupt status register and then writes the active 
> interrupt-related bits back out to ack the interrupt cause.
> But with an edge interrupt, this is racy as another source of interrupt 
> could be raised by ehci between the read and the write reaching the 
> hardware. 
> e.g.  If STS_IAA was set during the initial read, but some other bit like 
> STS_INT gets raised by the hardware between the read and the write to the 
> interrupt status register, the interrupt signal state won't drop.
> The interrupt state says high, and since it is now edged triggered with 
> MSI, no new invocation of the interrupt handler gets triggered.

Wouldn't it be better to change these other PCI drivers by adding 
proper MSI support?  I don't know what would be involved, but 
presumably it wouldn't be very hard.  (Just run the handler in a loop 
until all the interrupt status bits are off?)

Alan Stern