Michael S. Tsirkin wrote: > > > > >> + * the page if the vq is full. We are adding one entry each time, > > > > >> + * which essentially results in no memory allocation, so the > > > > >> + * GFP_KERNEL flag below can be ignored. > > > > >> + */ > > > > >> + if (vq->num_free) { > > > > >> + err = virtqueue_add_inbuf(vq, &sg, 1, vq, GFP_KERNEL); > > > > > > > > > > Should we kick here? At least when ring is close to > > > > > being full. Kick at half way full? > > > > > Otherwise it's unlikely ring will > > > > > ever be cleaned until we finish the scan. > > > > > > > > Since this add_one_sg() is called between spin_lock_irqsave(&zone->lock, flags) > > > > and spin_unlock_irqrestore(&zone->lock, flags), it is not permitted to sleep. > > > > > > kick takes a while sometimes but it doesn't sleep. > > > > I don't know about virtio. But the purpose of kicking here is to wait for pending data > > to be flushed in order to increase vq->num_free, isn't it? > > It isn't. It's to wake up device out of sleep to make it start > processing the pending data. If device isn't asleep, it's a nop. We need to wait until vq->num_free > 0 if vq->num_free == 0 if we want to allow virtqueue_add_inbuf() to succeed. When will vq->num_free++ be called? You said virtqueue_kick() is a no-op if the device is not asleep. Then, there will be no guarantee that we can make vq->num_free > 0 by calling virtqueue_kick(). Are you saying that virtqueue_kick(vq); while (!vq->num_free) virtqueue_get_buf(vq, &unused); err = virtqueue_add_inbuf(vq, &sg, 1, vq, GFP_KERNEL); BUG_ON(err); sequence from IRQ disabled atomic context is safe? If no, what is the point with calling virtqueue_kick() when ring is close to being (half way) full? We can't guarantee that all data is sent to QEMU after all. Also, why does the cmd id matter? If VIRTIO_BALLOON_F_FREE_PAGE_VQ does not guarantee the atomicity, I don't see the point of communicating the cmd id between the QEMU and the guest kernel. Just an EOF marker should be enough. I do want to see changes for the QEMU side in order to review changes for the guest kernel side.