Re: [PATCH] fuse: add optional kernel-enforced timeout for fuse requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 17, 2024 at 02:34:58PM -0700, Joanne Koong wrote:
> There are situations where fuse servers can become unresponsive or take
> too long to reply to a request. Currently there is no upper bound on
> how long a request may take, which may be frustrating to users who get
> stuck waiting for a request to complete.
> 
> This commit adds a daemon timeout option (in seconds) for fuse requests.
> If the timeout elapses before the request is replied to, the request will
> fail with -ETIME.
> 
> There are 3 possibilities for a request that times out:
> a) The request times out before the request has been sent to userspace
> b) The request times out after the request has been sent to userspace
> and before it receives a reply from the server
> c) The request times out after the request has been sent to userspace
> and the server replies while the kernel is timing out the request
> 
> Proper synchronization must be added to ensure that the request is
> handled correctly in all of these cases. To this effect, there is a new
> FR_PROCESSING bit added to the request flags, which is set atomically by
> either the timeout handler (see fuse_request_timeout()) which is invoked
> after the request timeout elapses or set by the request reply handler
> (see dev_do_write()), whichever gets there first.
> 
> If the reply handler and the timeout handler are executing simultaneously
> and the reply handler sets FR_PROCESSING before the timeout handler, then
> the request is re-queued onto the waitqueue and the kernel will process the
> reply as though the timeout did not elapse. If the timeout handler sets
> FR_PROCESSING before the reply handler, then the request will fail with
> -ETIME and the request will be cleaned up.
> 
> Proper acquires on the request reference must be added to ensure that the
> timeout handler does not drop the last refcount on the request while the
> reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is
> still accessing the request. (By "forwarder handler", this is the handler
> that forwards the request to userspace).
> 
> Currently, this is the lifecycle of the request refcount:
> 
> Request is created:
> fuse_simple_request -> allocates request, sets refcount to 1
>   __fuse_request_send -> acquires refcount
>     queues request and waits for reply...
> fuse_simple_request -> drops refcount
> 
> Request is freed:
> fuse_dev_do_write
>   fuse_request_end -> drops refcount on request
> 
> The timeout handler drops the refcount on the request so that the
> request is properly cleaned up if a reply is never received. Because of
> this, both the forwarder handler and the reply handler must acquire a refcount
> on the request while it accesses the request, and the refcount must be
> acquired while the lock of the list the request is on is held.
> 
> There is a potential race if the request is being forwarded to
> userspace while the timeout handler is executing (eg FR_PENDING has
> already been cleared but dev_do_read() hasn't finished executing). This
> is a problem because this would free the request but the request has not
> been removed from the fpq list it's on. To prevent this, dev_do_read()
> must check FR_PROCESSING at the end of its logic and remove the request
> from the fpq list if the timeout occurred.
> 
> There is also the case where the connection may be aborted or the
> device may be released while the timeout handler is running. To protect
> against an extra refcount drop on the request, the timeout handler
> checks the connected state of the list and lets the abort handler drop the
> last reference if the abort is running simultaneously. Similarly, the
> timeout handler also needs to check if the req->out.h.error is set to
> -ESTALE, which indicates that the device release is cleaning up the
> request. In both these cases, the timeout handler will return without
> dropping the refcount.
> 
> Please also note that background requests are not applicable for timeouts
> since they are asynchronous.
> 
> Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx>
> ---
>  fs/fuse/dev.c    | 177 ++++++++++++++++++++++++++++++++++++++++++++---
>  fs/fuse/fuse_i.h |  12 ++++
>  fs/fuse/inode.c  |   7 ++
>  3 files changed, 188 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 9eb191b5c4de..7dd7b244951b 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -331,6 +331,69 @@ void fuse_request_end(struct fuse_req *req)
>  }
>  EXPORT_SYMBOL_GPL(fuse_request_end);
>  
> +/* fuse_request_end for requests that timeout */
> +static void fuse_request_timeout(struct fuse_req *req)
> +{
> +	struct fuse_conn *fc = req->fm->fc;
> +	struct fuse_iqueue *fiq = &fc->iq;
> +	struct fuse_pqueue *fpq;
> +
> +	spin_lock(&fiq->lock);
> +	if (!fiq->connected) {
> +		spin_unlock(&fiq->lock);
> +		/*
> +		 * Connection is being aborted. The abort will release
> +		 * the refcount on the request
> +		 */
> +		req->out.h.error = -ECONNABORTED;
> +		return;
> +	}
> +	if (test_bit(FR_PENDING, &req->flags)) {
> +		/* Request is not yet in userspace, bail out */
> +		list_del(&req->list);
> +		spin_unlock(&fiq->lock);
> +		req->out.h.error = -ETIME;
> +		__fuse_put_request(req);

Why is this safe?  We could be the last holder of the reference on this request
correct?  The only places using __fuse_put_request() would be where we are in a
path where the caller already holds a reference on the request.  Since this is
async it may not be the case right?

If it is safe then it's just confusing and warrants a comment.

> +		return;
> +	}
> +	if (test_bit(FR_INTERRUPTED, &req->flags))
> +		list_del_init(&req->intr_entry);
> +
> +	fpq = req->fpq;
> +	spin_unlock(&fiq->lock);
> +
> +	if (fpq) {
> +		spin_lock(&fpq->lock);
> +		if (!fpq->connected && (!test_bit(FR_PRIVATE, &req->flags))) {
                                       ^^

You don't need the extra () there.

> +			spin_unlock(&fpq->lock);
> +			/*
> +			 * Connection is being aborted. The abort will release
> +			 * the refcount on the request
> +			 */
> +			req->out.h.error = -ECONNABORTED;
> +			return;
> +		}
> +		if (req->out.h.error == -ESTALE) {
> +			/*
> +			 * Device is being released. The fuse_dev_release call
> +			 * will drop the refcount on the request
> +			 */
> +			spin_unlock(&fpq->lock);
> +			return;
> +		}
> +		if (!test_bit(FR_PRIVATE, &req->flags))
> +			list_del_init(&req->list);
> +		spin_unlock(&fpq->lock);
> +	}
> +
> +	req->out.h.error = -ETIME;
> +
> +	if (test_bit(FR_ASYNC, &req->flags))
> +		req->args->end(req->fm, req->args, req->out.h.error);
> +
> +	fuse_put_request(req);
> +}

Just a general styling thing, we have two different states for requests here,
PENDING and !PENDING correct?  I think it may be better to do something like

if (test_bit(FR_PENDING, &req->flags))
	timeout_pending_req();
else
	timeout_inflight_req();

and then in timeout_pending_req() you do

spin_lock(&fiq->lock);
if (!test_bit(FR_PENDING, &req->flags)) {
	spin_unlock(&fiq_lock);
	timeout_inflight_req();
	return;
}

This will keep the two different state cleanup functions separate and a little
cleaner to grok.

> +
>  static int queue_interrupt(struct fuse_req *req)
>  {
>  	struct fuse_iqueue *fiq = &req->fm->fc->iq;
> @@ -361,6 +424,62 @@ static int queue_interrupt(struct fuse_req *req)
>  	return 0;
>  }
>  
> +enum wait_type {
> +	WAIT_TYPE_INTERRUPTIBLE,
> +	WAIT_TYPE_KILLABLE,
> +	WAIT_TYPE_NONINTERRUPTIBLE,
> +};
> +
> +static int fuse_wait_event_interruptible_timeout(struct fuse_req *req)
> +{
> +	struct fuse_conn *fc = req->fm->fc;
> +
> +	return wait_event_interruptible_timeout(req->waitq,
> +						test_bit(FR_FINISHED,
> +							 &req->flags),
> +						fc->daemon_timeout);
> +}
> +ALLOW_ERROR_INJECTION(fuse_wait_event_interruptible_timeout, ERRNO);
> +
> +static int wait_answer_timeout(struct fuse_req *req, enum wait_type type)
> +{
> +	struct fuse_conn *fc = req->fm->fc;
> +	int err;
> +
> +wait_answer_start:
> +	if (type == WAIT_TYPE_INTERRUPTIBLE)
> +		err = fuse_wait_event_interruptible_timeout(req);
> +	else if (type == WAIT_TYPE_KILLABLE)
> +		err = wait_event_killable_timeout(req->waitq,
> +						  test_bit(FR_FINISHED, &req->flags),
> +						  fc->daemon_timeout);
> +
> +	else if (type == WAIT_TYPE_NONINTERRUPTIBLE)
> +		err = wait_event_timeout(req->waitq, test_bit(FR_FINISHED, &req->flags),
> +					 fc->daemon_timeout);
> +	else
> +		WARN_ON(1);

This will leak some random value for err, so initialize err to something that
will be dealt with, like -EINVAL;

> +
> +	/* request was answered */
> +	if (err > 0)
> +		return 0;
> +
> +	/* request was not answered in time */
> +	if (err == 0) {
> +		if (test_and_set_bit(FR_PROCESSING, &req->flags))
> +			/* request reply is being processed by kernel right now.
> +			 * we should wait for the answer.
> +			 */

Format for multiline comments is

/*
 * blah
 * blah
 */

and since this is a 1 line if statement put it above the if statement.

> +			goto wait_answer_start;
> +
> +		fuse_request_timeout(req);
> +		return 0;
> +	}
> +
> +	/* else request was interrupted */
> +	return err;
> +}
> +
>  static void request_wait_answer(struct fuse_req *req)
>  {
>  	struct fuse_conn *fc = req->fm->fc;
> @@ -369,8 +488,11 @@ static void request_wait_answer(struct fuse_req *req)
>  
>  	if (!fc->no_interrupt) {
>  		/* Any signal may interrupt this */
> -		err = wait_event_interruptible(req->waitq,
> -					test_bit(FR_FINISHED, &req->flags));
> +		if (fc->daemon_timeout)
> +			err = wait_answer_timeout(req, WAIT_TYPE_INTERRUPTIBLE);
> +		else
> +			err = wait_event_interruptible(req->waitq,
> +						       test_bit(FR_FINISHED, &req->flags));
>  		if (!err)
>  			return;
>  
> @@ -383,8 +505,11 @@ static void request_wait_answer(struct fuse_req *req)
>  
>  	if (!test_bit(FR_FORCE, &req->flags)) {
>  		/* Only fatal signals may interrupt this */
> -		err = wait_event_killable(req->waitq,
> -					test_bit(FR_FINISHED, &req->flags));
> +		if (fc->daemon_timeout)
> +			err = wait_answer_timeout(req, WAIT_TYPE_KILLABLE);
> +		else
> +			err = wait_event_killable(req->waitq,
> +						  test_bit(FR_FINISHED, &req->flags));
>  		if (!err)
>  			return;
>  
> @@ -404,7 +529,10 @@ static void request_wait_answer(struct fuse_req *req)
>  	 * Either request is already in userspace, or it was forced.
>  	 * Wait it out.
>  	 */
> -	wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
> +	if (fc->daemon_timeout)
> +		wait_answer_timeout(req, WAIT_TYPE_NONINTERRUPTIBLE);
> +	else
> +		wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
>  }
>  
>  static void __fuse_request_send(struct fuse_req *req)
> @@ -1268,6 +1396,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>  	req = list_entry(fiq->pending.next, struct fuse_req, list);
>  	clear_bit(FR_PENDING, &req->flags);
>  	list_del_init(&req->list);
> +	/* Acquire a reference since fuse_request_timeout may also be executing  */
> +	__fuse_get_request(req);
> +	req->fpq = fpq;
>  	spin_unlock(&fiq->lock);
>  

There's a race here with completion.  If we timeout a request right here, we can
end up sending that same request below.

You are going to need to check

test_bit(FR_PROCESSING)

after you take the fpq->lock just below here to make sure you didn't race with
the timeout handler and time the request out already.

>  	args = req->args;
> @@ -1280,6 +1411,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>  		if (args->opcode == FUSE_SETXATTR)
>  			req->out.h.error = -E2BIG;
>  		fuse_request_end(req);
> +		fuse_put_request(req);
>  		goto restart;
>  	}
>  	spin_lock(&fpq->lock);
> @@ -1316,13 +1448,23 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>  	}
>  	hash = fuse_req_hash(req->in.h.unique);
>  	list_move_tail(&req->list, &fpq->processing[hash]);
> -	__fuse_get_request(req);
>  	set_bit(FR_SENT, &req->flags);
>  	spin_unlock(&fpq->lock);
>  	/* matches barrier in request_wait_answer() */
>  	smp_mb__after_atomic();
>  	if (test_bit(FR_INTERRUPTED, &req->flags))
>  		queue_interrupt(req);
> +
> +	/* Check if request timed out */
> +	if (test_bit(FR_PROCESSING, &req->flags)) {
> +		spin_lock(&fpq->lock);
> +		if (!test_bit(FR_PRIVATE, &req->flags))
> +			list_del_init(&req->list);
> +		spin_unlock(&fpq->lock);
> +		fuse_put_request(req);
> +		return -ETIME;
> +	}

This isn't quite right, we could have FR_PROCESSING set because we completed the
request before we got here.  If you put a schedule_timeout(HZ); right above this
you could easily see where a request gets completed by userspace, but now you've
fimed it out.

Additionally if we have FR_PROCESSING set from the timeout, shouldn't this
cleanup have been done already?  I don't understand why we need to handle this
here, we should just return and whoever is waiting on the request will get the
error.

Thanks,

Josef




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux