On 08/17/2017 02:29 PM, Keith Busch wrote: > On Thu, Aug 17, 2017 at 02:17:08PM -0600, Jens Axboe wrote: >> On 08/17/2017 02:15 PM, Keith Busch wrote: >>> On Thu, Aug 17, 2017 at 01:32:20PM -0600, Jens Axboe wrote: >>>> We currently have an issue with nvme when polling is used. Just >>>> ran some testing on 4.13-rc5, and it's trivial to trigger an IRQ >>>> disable ala: >>>> >>>> [ 52.412851] irq 77: nobody cared (try booting with the "irqpoll" option) >>>> [ 52.415310] irq 70: nobody cared (try booting with the "irqpoll" option) >>>> >>>> when running a few processes polling. The reason is pretty obvious - if >>>> we're effective at polling, the triggered IRQ will never find any >>>> events. If this happens enough times in a row, the kernel disables our >>>> IRQ since we keep returning IRQ_NONE. >>> >>> If you're seeing IRQ_NONE returned, the NVMe driver didn't poll any >>> completions since the last time nvme_irq was called. The cqe_seen on >>> polled compeletions is sticky until the IRQ handler is run, in which >>> case it returns IRQ_HANDLED even when no completions were handled during >>> that interrupt. >>> >>> The only way it should be able to return IRQ_NONE is if no completions >>> were observed (polled or otherwise) since the last time the IRQ handler >>> was called. >> >> The polling do not update the cqe_seen. So it's possible that every time >> the IRQ handler does trigger, there are no entries found. Maybe a better >> or simpler fix would be to have the polling set cqe_seen to true, and >> leave the clearing to the interrupt handler as is done now. > > Oops, that looks like a mistake. I don't think we want to suppress > spurious interrupt detection, though. How about this patch to set cq_seen > on polling like we used to have? Yes, that will work. We need to get this into 4.13. -- Jens Axboe