> -----Original Message----- > From: Sagi Grimberg [mailto:sagig@xxxxxxxxxxxxxxxxxx] > Sent: Tuesday, November 04, 2014 6:15 AM > To: Bart Van Assche; Elliott, Robert (Server Storage); Christoph Hellwig > Cc: Jens Axboe; Sagi Grimberg; Sebastian Parschauer; Ming Lei; linux- > scsi@xxxxxxxxxxxxxxx; linux-rdma > Subject: Re: [PATCH v2 12/12] IB/srp: Add multichannel support > ... > I think that Rob and I are not talking about the same issue. In > case only a single core is servicing interrupts it is indeed expected > that it will spend 100% in hard-irq, that's acceptable since it is > pounded with completions all the time. > > However, I'm referring to a condition where SRP will spend infinite > time servicing a single interrupt (while loop on ib_poll_cq that never > drains) which will lead to a hard lockup. > > This *can* happen, and I do believe that with an optimized IO path > it is even more likely to. If the IB completions/interrupts are only for IOs submitted on this CPU, then the CQ will eventually drain, because this CPU is not submitting anything new while stuck in the loop. This can become bursty, though - submit a lot of IOs, then be busy completing all of them and not submitting more, resulting in the queue depth bouncing from 0 to high to 0 to high. I've seen that with both hpsa and mpt3sas drivers. The fio options iodepth_batch, iodepth_batch_complete, and iodepth_low can amplify and reduce that effect (using libaio). I haven't found a good way for the LLD ISRs and the block layer completion code to decide to yield the CPU based on how much time they are taking - that would almost qualify as a realtime kernel feature. If you compile with CONFIG_IRQ_TIME_ACCOUNTING, the kernel does keep track of that information; perhaps that could be exported so modules can use it? --- Rob Elliott, HP Server Storage -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html