On 01/06/2023 20:45, Vivek Goyal wrote: > On Thu, Jun 01, 2023 at 10:08:50AM -0400, Stefan Hajnoczi wrote: >> On Wed, May 31, 2023 at 04:49:39PM -0400, Vivek Goyal wrote: >>> On Wed, May 31, 2023 at 10:34:15PM +0200, Peter-Jan Gootzen wrote: >>>> On 31/05/2023 21:18, Vivek Goyal wrote: >>>>> On Wed, May 31, 2023 at 07:10:32PM +0200, Peter-Jan Gootzen wrote: >>>>>> When the Virtio queue is full, a work item is scheduled >>>>>> to execute in 1ms that retries adding the request to the queue. >>>>>> This is a large amount of time on the scale on which a >>>>>> virtio-fs device can operate. When using a DPU this is around >>>>>> 40us baseline without going to a remote server (4k, QD=1). >>>>>> This patch queues requests when the Virtio queue is full, >>>>>> and when a completed request is taken off, immediately fills >>>>>> it back up with queued requests. >>>>>> >>>>>> This reduces the 99.9th percentile latencies in our tests by >>>>>> 60x and slightly increases the overall throughput, when using a >>>>>> queue depth 2x the size of the Virtio queue size, with a >>>>>> DPU-powered virtio-fs device. >>>>>> >>>>>> Signed-off-by: Peter-Jan Gootzen <peter-jan@xxxxxxxxxxx> >>>>>> --- >>>>>> V1 -> V2: Not scheduling dispatch work anymore when not needed >>>>>> and changed delayed_work structs to work_struct structs >>>>>> >>>>>> fs/fuse/virtio_fs.c | 32 +++++++++++++++++--------------- >>>>>> 1 file changed, 17 insertions(+), 15 deletions(-) >>>>>> >>>>>> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c >>>>>> index 4d8d4f16c727..a676297db09b 100644 >>>>>> --- a/fs/fuse/virtio_fs.c >>>>>> +++ b/fs/fuse/virtio_fs.c >>>>>> @@ -45,7 +45,7 @@ struct virtio_fs_vq { >>>>>> struct work_struct done_work; >>>>>> struct list_head queued_reqs; >>>>>> struct list_head end_reqs; /* End these requests */ >>>>>> - struct delayed_work dispatch_work; >>>>>> + struct work_struct dispatch_work; >>>>>> struct fuse_dev *fud; >>>>>> bool connected; >>>>>> long in_flight; >>>>>> @@ -202,7 +202,7 @@ static void virtio_fs_drain_queue(struct virtio_fs_vq *fsvq) >>>>>> } >>>>>> >>>>>> flush_work(&fsvq->done_work); >>>>>> - flush_delayed_work(&fsvq->dispatch_work); >>>>>> + flush_work(&fsvq->dispatch_work); >>>>>> } >>>>>> >>>>>> static void virtio_fs_drain_all_queues_locked(struct virtio_fs *fs) >>>>>> @@ -346,6 +346,9 @@ static void virtio_fs_hiprio_done_work(struct work_struct *work) >>>>>> dec_in_flight_req(fsvq); >>>>>> } >>>>>> } while (!virtqueue_enable_cb(vq) && likely(!virtqueue_is_broken(vq))); >>>>>> + >>>>>> + if (!list_empty(&fsvq->queued_reqs)) >>>>>> + schedule_work(&fsvq->dispatch_work); >>>>>> spin_unlock(&fsvq->lock); >>>>>> } >>>>>> >>>>>> @@ -353,7 +356,7 @@ static void virtio_fs_request_dispatch_work(struct work_struct *work) >>>>>> { >>>>>> struct fuse_req *req; >>>>>> struct virtio_fs_vq *fsvq = container_of(work, struct virtio_fs_vq, >>>>>> - dispatch_work.work); >>>>>> + dispatch_work); >>>>>> int ret; >>>>>> >>>>>> pr_debug("virtio-fs: worker %s called.\n", __func__); >>>>>> @@ -388,8 +391,6 @@ static void virtio_fs_request_dispatch_work(struct work_struct *work) >>>>>> if (ret == -ENOMEM || ret == -ENOSPC) { >>>>>> spin_lock(&fsvq->lock); >>>>>> list_add_tail(&req->list, &fsvq->queued_reqs); >>>>>> - schedule_delayed_work(&fsvq->dispatch_work, >>>>>> - msecs_to_jiffies(1)); >>>>> >>>>> Virtqueue being full is only one of the reasons for failure to queue >>>>> the request. What if virtqueue is empty but we could not queue the >>>>> request because lack of memory (-ENOMEM). In that case we will queue >>>>> the request and it might not be dispatched because there is no completion. >>>>> (Assume there is no further new request coming). That means deadlock? >>>>> >>>>> Thanks >>>>> Vivek >>>>> >>>> >>>> Good catch that will deadlock. >>>> >>>> Is default kernel behavior to indefinitely retry a file system >>>> request until memory is available? >>> >>> As of now that seems to be the behavior. I think I had copied this >>> code from another driver. >>> >>> But I guess one can argue that if memory is not available, then >>> return -ENOMEM to user space instead of retrying in kernel. >>> >>> Stefan, Miklos, WDYT? >> >> My understanding is that file system syscalls may return ENOMEM, so this >> is okay. > > Ok. Fair enough. Thanks. > > One more question. How do we know virtqueue is full. Is -ENOSPC is the > correct error code to check and retry indefinitely. Are there other > situations where -ENOSPC can be returned. Peter's current patch rely > on the fact that there is atleast one completion happening after > queuing of request which will kick the worker thread and submit the > request later. > > We need to watch out for race conditions very closely. If that assumption > is not valid in some cases or there are races between getting -ENOSPC > and request completions, we will have a deadlock scenario. > > Thanks > Vivek > When following the code path, -ENOSPC is only returned when the queue is full. So that assumption is correct. As for race conditions where a request might be lost. If -ENOSPC is returned and in the mean time the queue has become full again, the worker will simply fail and be re-executed when something is taken off the queue again. It is possible that this will happen indefinitely, but that's also possible in the current version of the driver. Peter-Jan _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization