I'd like to attend LSF/MM and would like to discuss polling for block
drivers.
Currently there is blk-iopoll but it is neither as widely used as NAPI in
the networking field and accoring to Sagi's findings in [1] performance
with polling is not on par with IRQ usage.
On LSF/MM I'd like to whether it is desirable to have NAPI like polling in
more block drivers and how to overcome the currently seen performance
issues.
[1] http://lists.infradead.org/pipermail/linux-nvme/2016-October/006975.ht
ml
A typical Ethernet network adapter delays the generation of an interrupt
after it has received a packet. A typical block device or HBA does not delay
the generation of an interrupt that reports an I/O completion. I think that
is why polling is more effective for network adapters than for block
devices. I'm not sure whether it is possible to achieve benefits similar to
NAPI for block devices without implementing interrupt coalescing in the
block device firmware. Note: for block device implementations that use the
RDMA API, the RDMA API supports interrupt coalescing (see also
ib_modify_cq()).
Hey Bart,
I don't agree that interrupt coalescing is the reason why irq-poll is
not suitable for nvme or storage devices.
First, when the nvme device fires an interrupt, the driver consumes
the completion(s) from the interrupt (usually there will be some more
completions waiting in the cq by the time the host start processing it).
With irq-poll, we disable further interrupts and schedule soft-irq for
processing, which if at all, improve the completions per interrupt
utilization (because it takes slightly longer before processing the cq).
Moreover, irq-poll is budgeting the completion queue processing which is
important for a couple of reasons.
1. it prevents hard-irq context abuse like we do today. if other cpu
cores are pounding with more submissions on the same queue, we might
get into a hard-lockup (which I've seen happening).
2. irq-poll maintains fairness between devices by correctly budgeting
the processing of different completions queues that share the same
affinity. This can become crucial when working with multiple nvme
devices, each has multiple io queues that share the same IRQ
assignment.
3. It reduces (or at least should reduce) the overall number of
interrupts in the system because we only enable interrupts again
when the completion queue is completely processed.
So overall, I think it's very useful for nvme and other modern HBAs,
but unfortunately, other than solving (1), I wasn't able to see
performance improvement but rather a slight regression, but I can't
explain where its coming from...
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html