On Thu, May 27, 2021 at 01:48:36PM +0800, Jason Wang wrote: > > 在 2021/5/25 下午4:59, Stefan Hajnoczi 写道: > > On Tue, May 25, 2021 at 11:21:41AM +0800, Jason Wang wrote: > > > 在 2021/5/20 下午10:13, Stefan Hajnoczi 写道: > > > > Request completion latency can be reduced by using polling instead of > > > > irqs. Even Posted Interrupts or similar hardware support doesn't beat > > > > polling. The reason is that disabling virtqueue notifications saves > > > > critical-path CPU cycles on the host by skipping irq injection and in > > > > the guest by skipping the irq handler. So let's add blk_mq_ops->poll() > > > > support to virtio_blk. > > > > > > > > The approach taken by this patch differs from the NVMe driver's > > > > approach. NVMe dedicates hardware queues to polling and submits > > > > REQ_HIPRI requests only on those queues. This patch does not require > > > > exclusive polling queues for virtio_blk. Instead, it switches between > > > > irqs and polling when one or more REQ_HIPRI requests are in flight on a > > > > virtqueue. > > > > > > > > This is possible because toggling virtqueue notifications is cheap even > > > > while the virtqueue is running. NVMe cqs can't do this because irqs are > > > > only enabled/disabled at queue creation time. > > > > > > > > This toggling approach requires no configuration. There is no need to > > > > dedicate queues ahead of time or to teach users and orchestration tools > > > > how to set up polling queues. > > > > > > > > Possible drawbacks of this approach: > > > > > > > > - Hardware virtio_blk implementations may find virtqueue_disable_cb() > > > > expensive since it requires DMA. > > > > > > Note that it's probably not related to the behavior of the driver but the > > > design of the event suppression mechanism. > > > > > > Device can choose to ignore the suppression flag and keep sending > > > interrupts. > > Yes, it's the design of the event suppression mechanism. > > > > If we use dedicated polling virtqueues then the hardware doesn't need to > > check whether interrupts are enabled for each notification. However, > > there's no mechanism to tell the device that virtqueue interrupts are > > permanently disabled. This means that as of today, even dedicated > > virtqueues cannot suppress interrupts without the device checking for > > each notification. > > > This can be detected via a transport specific way. > > E.g in the case of MSI, VIRTIO_MSI_NO_VECTOR could be a hint. Nice idea :). Then there would be no need for changes to the hardware interface. IRQ-less virtqueues is could still be mentioned explicitly in the VIRTIO spec so that driver/device authors are aware of the VIRTIO_MSI_NO_VECTOR trick. > > > > +static int virtblk_poll(struct blk_mq_hw_ctx *hctx) > > > > +{ > > > > + struct virtio_blk *vblk = hctx->queue->queuedata; > > > > + struct virtqueue *vq = vblk->vqs[hctx->queue_num].vq; > > > > + > > > > + if (!virtqueue_more_used(vq)) > > > > > > I'm not familiar with block polling but what happens if a buffer is made > > > available after virtqueue_more_used() returns false here? > > Can you explain the scenario, I'm not sure I understand? "buffer is made > > available" -> are you thinking about additional requests being submitted > > by the driver or an in-flight request being marked used by the device? > > > Something like that: > > 1) requests are submitted > 2) poll but virtqueue_more_used() return false > 3) device make buffer used > > In this case, will poll() be triggered again by somebody else? (I think > interrupt is disabled here). Yes. An example blk_poll() user is fs/block_dev.c:__blkdev_direct_IO_simple(): qc = submit_bio(&bio); for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); if (!READ_ONCE(bio.bi_private)) break; if (!(iocb->ki_flags & IOCB_HIPRI) || !blk_poll(bdev_get_queue(bdev), qc, true)) blk_io_schedule(); } That's the infinite loop. The block layer implements the generic portion of blk_poll(). blk_poll() calls mq_ops->poll() (virtblk_poll()). So in general the polling loop will keep iterating, but there are exceptions: 1. need_resched() causes blk_poll() to return 0 and blk_io_schedule() will be called. 2. blk-mq has a fancier io_poll algorithm that estimates I/O time and sleeps until the expected completion time to save CPU cycles. I haven't looked into detail at this one. Both these cases affect existing mq_ops->poll() implementations (e.g. NVMe). What's new in this patch series is that virtio-blk could have non-polling requests on the virtqueue which now has irqs disabled. So we could wait for them. I think there's an easy solution for this: don't disable virtqueue irqs when there are non-REQ_HIPRI requests in flight. The disadvantage is that we'll keep irqs disable in more situations so the performance improvement may not apply in some configurations. Stefan
Attachment:
signature.asc
Description: PGP signature