Re: [PATCH v8 2/3] fuse: add optional kernel-enforced timeout for requests

Bernd Schubert <bernd.schubert@xxxxxxxxxxx> · Wed, 30 Oct 2024 18:50:43 +0100

On 10/30/24 18:21, Joanne Koong wrote:
> On Tue, Oct 29, 2024 at 12:17 PM Bernd Schubert
> <bernd.schubert@xxxxxxxxxxx> wrote:
>>
>>
>>
>> On 10/11/24 21:13, Joanne Koong wrote:
>>> There are situations where fuse servers can become unresponsive or
>>> stuck, for example if the server is deadlocked. Currently, there's no
>>> good way to detect if a server is stuck and needs to be killed manually.
>>>
>>> This commit adds an option for enforcing a timeout (in minutes) for
>>> requests where if the timeout elapses without the server responding to
>>> the request, the connection will be automatically aborted.
>>>
>>> Please note that these timeouts are not 100% precise. The request may
>>> take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested max
>>> timeout due to how it's internally implemented.
>>>
>>> Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx>
>>> ---
>>>  fs/fuse/dev.c    | 80 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>  fs/fuse/fuse_i.h | 21 +++++++++++++
>>>  fs/fuse/inode.c  | 21 +++++++++++++
>>>  3 files changed, 122 insertions(+)
>>>
>>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
>>> index 1f64ae6d7a69..054bfa2a26ed 100644
>>> --- a/fs/fuse/dev.c
>>> +++ b/fs/fuse/dev.c
>>> @@ -45,6 +45,82 @@ static struct fuse_dev *fuse_get_dev(struct file *file)
>>>       return READ_ONCE(file->private_data);
>>>  }
>>>
>>> +static bool request_expired(struct fuse_conn *fc, struct fuse_req *req)
>>> +{
>>> +     return jiffies > req->create_time + fc->timeout.req_timeout;
>>> +}
>>> +
>>> +/*
>>> + * Check if any requests aren't being completed by the specified request
>>> + * timeout. To do so, we:
>>> + * - check the fiq pending list
>>> + * - check the bg queue
>>> + * - check the fpq io and processing lists
>>> + *
>>> + * To make this fast, we only check against the head request on each list since
>>> + * these are generally queued in order of creation time (eg newer requests get
>>> + * queued to the tail). We might miss a few edge cases (eg requests transitioning
>>> + * between lists, re-sent requests at the head of the pending list having a
>>> + * later creation time than other requests on that list, etc.) but that is fine
>>> + * since if the request never gets fulfilled, it will eventually be caught.
>>> + */
>>> +void fuse_check_timeout(struct timer_list *timer)
>>> +{
>>> +     struct fuse_conn *fc = container_of(timer, struct fuse_conn, timeout.timer);
>>> +     struct fuse_iqueue *fiq = &fc->iq;
>>> +     struct fuse_req *req;
>>> +     struct fuse_dev *fud;
>>> +     struct fuse_pqueue *fpq;
>>> +     bool expired = false;
>>> +     int i;
>>> +
>>> +     spin_lock(&fiq->lock);
>>> +     req = list_first_entry_or_null(&fiq->pending, struct fuse_req, list);
>>> +     if (req)
>>> +             expired = request_expired(fc, req);
>>> +     spin_unlock(&fiq->lock);
>>> +     if (expired)
>>> +             goto abort_conn;
>>> +
>>> +     spin_lock(&fc->bg_lock);
>>> +     req = list_first_entry_or_null(&fc->bg_queue, struct fuse_req, list);
>>> +     if (req)
>>> +             expired = request_expired(fc, req);
>>> +     spin_unlock(&fc->bg_lock);
>>> +     if (expired)
>>> +             goto abort_conn;
>>> +
>>> +     spin_lock(&fc->lock);
>>> +     if (!fc->connected) {
>>> +             spin_unlock(&fc->lock);
>>> +             return;
>>> +     }
>>> +     list_for_each_entry(fud, &fc->devices, entry) {
>>> +             fpq = &fud->pq;
>>> +             spin_lock(&fpq->lock);
>>> +             req = list_first_entry_or_null(&fpq->io, struct fuse_req, list);
>>> +             if (req && request_expired(fc, req))
>>> +                     goto fpq_abort;
>>> +
>>> +             for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) {
>>> +                     req = list_first_entry_or_null(&fpq->processing[i], struct fuse_req, list);
>>> +                     if (req && request_expired(fc, req))
>>> +                             goto fpq_abort;
>>> +             }
>>> +             spin_unlock(&fpq->lock);
>>> +     }
>>> +     spin_unlock(&fc->lock);
>>
>> I really don't have a strong opinion on that - I wonder if it wouldn't
>> be better for this part to have an extra timeout list per fud or pq as
>> previously. That would slightly increases memory usage and overhead per
>> request as a second list is needed, but would reduce these 1/min cpu
>> spikes as only one list per fud would need to be checked. But then, it
>> would be easy to change that later, if timeout checks turn out to be a
>> problem.
>>
> 
> Thanks for the review.
> 
> On v7 [1] which used an extra timeout list, the feedback was
> 
> "One thing I worry about is adding more roadblocks on the way to making
> request queuing more scalable.
> 
> Currently there's fc->num_waiting that's touched on all requests and
> bg_queue/bg_lock that are touched on background requests.  We should
> be trying to fix these bottlenecks instead of adding more.
> 
> Can't we use the existing lists to scan requests?
> 
> It's more complex, obviously, but at least it doesn't introduce yet
> another per-fc list to worry about."
> 
> 
> [1] https://lore.kernel.org/linux-fsdevel/CAJfpegs9A7iBbZpPMF-WuR48Ho_=z_ZWfjrLQG2ob0k6NbcaUg@xxxxxxxxxxxxxx/

Yeah I know, but I'm just a bit scared about the cpu spikes this gives
on a recent systems (like 96 cores AMDs we have in the lab) and when
device clone is activated (and Miklos had also asked to use the same 
hash lists to find requests with fuse-uring). 
The previous discussion was mainly about avoiding a new global lock,
i.e. I wonder if can't use a mix - avoid the new list and its lock,
but still avoid scanning through [FUSE_PQ_HASH_SIZE] lists per
cloned-device/core.

I'm also not opposed against the current version and optimize it later.

Thanks,
Bernd