> On Jan 22, 2025, at 10:13 AM, Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: > > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. > > > On Wed, Jan 22, 2025 at 02:44:50PM +0000, Boyer, Andrew wrote: >> >> >> On Jan 9, 2025, at 8:42 AM, Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: >> >> On Thu, Jan 09, 2025 at 01:01:20PM +0100, Christian Borntraeger wrote: >> >> >> Am 07.01.25 um 19:25 schrieb Andrew Boyer: >> >> Commit af8ececda185 ("virtio: add VIRTIO_F_NOTIFICATION_DATA >> feature >> support") added notification data support to the core virtio driver >> code. When this feature is enabled, the notification includes the >> updated producer index for the queue. Thus it is now critical that >> notifications arrive in order. >> >> The virtio_blk driver has historically not worried about >> notification >> ordering. Modify it so that the prepare and kick steps are both >> done >> under the vq lock. >> >> Signed-off-by: Andrew Boyer <andrew.boyer@xxxxxxx> >> Reviewed-by: Brett Creeley <brett.creeley@xxxxxxx> >> Fixes: af8ececda185 ("virtio: add VIRTIO_F_NOTIFICATION_DATA >> feature support") >> Cc: Viktor Prutyanov <viktor@xxxxxxxxxx> >> Cc: virtualization@xxxxxxxxxxxxxxx >> Cc: linux-block@xxxxxxxxxxxxxxx >> --- >> drivers/block/virtio_blk.c | 19 ++++--------------- >> 1 file changed, 4 insertions(+), 15 deletions(-) >> >> diff --git a/drivers/block/virtio_blk.c b/drivers/block/ >> virtio_blk.c >> index 3efe378f1386..14d9e66bb844 100644 >> --- a/drivers/block/virtio_blk.c >> +++ b/drivers/block/virtio_blk.c >> @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct >> blk_mq_hw_ctx *hctx) >> { >> struct virtio_blk *vblk = hctx->queue->queuedata; >> struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num]; >> - bool kick; >> spin_lock_irq(&vq->lock); >> - kick = virtqueue_kick_prepare(vq->vq); >> + virtqueue_kick(vq->vq); >> spin_unlock_irq(&vq->lock); >> - >> - if (kick) >> - virtqueue_notify(vq->vq); >> } >> >> >> I would assume this will be a performance nightmare for normal IO. >> >> >> >> >> Hello Michael and Christian and Jason, >> Thank you for taking a look. >> >> Is the performance concern that the vmexit might lead to the underlying virtual >> storage stack doing the work immediately? Any other job posting to the same >> queue would presumably be blocked on a vmexit when it goes to attempt its own >> notification. That would be almost the same as having the other job block on a >> lock during the operation, although I guess if you are skipping notifications >> somehow it would look different. >> >> I don't have any sort of setup where I can try it but I would appreciate it if >> someone else could. >> >> >> Hmm. Not good, notify can be very slow, holding a lock is a bad idea. >> Basically, virtqueue_notify must work ouside of locks, this >> means af8ececda185 is broken and we did not notice. >> >> Let's fix it please. >> >> >> With so many broken kernels already in the wild, I think disabling >> F_NOTIFICATION_DATA for virtio-blk would be a reasonable solution. > > Some devices might fail feature negotiation then. > I am not sure they are broken, devices might simply be able to > handle out of order values. > A driver which does not support F_NOTIFICATION_DATA should just clear that bit. Surely devices which support it would also support not enabling it? Otherwise pre-6.4 kernels wouldn't work at all. > >> >> Try some kind of compare and swap scheme where we detect that index >> was updated since? Will allow skipping a notification, too. >> >> >> Do you have an idea of how this might be done? Anything I've come up with >> involves a lock. >> >> Would it be doable to have a lock for the vq management stuff >> and a second one to post notifications? > > > and only for when F_NOTIFICATION_DATA is set. not terrible ok I think. > >> >> AMD guys, can't device survive with reordered notifications? >> Basically just drop a notification if you see index >> going back? >> >> >> This is the driver lying to us about the state of the queue; it's not going to >> be possible for us to work around it in hardware. For starters, how would we >> detect queue wrap around? >> >> Thank you, >> Andrew > > The index is a running value for split, for wrap arounds, there is > a special bit for that. No? > This is a hardware block used for many different interfaces and devices. When the notification write comes through, the doorbell block updates the queue state and schedules the queue for work. If a second notification comes in and overwrites that update before the queue is able to run (going backwards but not wrapping), we'll have no way of detecting it. -Andrew