On Thu, Jul 18, 2024 at 7:44 AM Josef Bacik <josef@xxxxxxxxxxxxxx> wrote: > > On Wed, Jul 17, 2024 at 02:34:58PM -0700, Joanne Koong wrote: ... > > --- > > fs/fuse/dev.c | 177 ++++++++++++++++++++++++++++++++++++++++++++--- > > fs/fuse/fuse_i.h | 12 ++++ > > fs/fuse/inode.c | 7 ++ > > 3 files changed, 188 insertions(+), 8 deletions(-) > > > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > > index 9eb191b5c4de..7dd7b244951b 100644 > > --- a/fs/fuse/dev.c > > +++ b/fs/fuse/dev.c > > @@ -331,6 +331,69 @@ void fuse_request_end(struct fuse_req *req) > > } > > EXPORT_SYMBOL_GPL(fuse_request_end); > > > > +/* fuse_request_end for requests that timeout */ > > +static void fuse_request_timeout(struct fuse_req *req) > > +{ > > + struct fuse_conn *fc = req->fm->fc; > > + struct fuse_iqueue *fiq = &fc->iq; > > + struct fuse_pqueue *fpq; > > + > > + spin_lock(&fiq->lock); > > + if (!fiq->connected) { > > + spin_unlock(&fiq->lock); > > + /* > > + * Connection is being aborted. The abort will release > > + * the refcount on the request > > + */ > > + req->out.h.error = -ECONNABORTED; > > + return; > > + } > > + if (test_bit(FR_PENDING, &req->flags)) { > > + /* Request is not yet in userspace, bail out */ > > + list_del(&req->list); > > + spin_unlock(&fiq->lock); > > + req->out.h.error = -ETIME; > > + __fuse_put_request(req); > > Why is this safe? We could be the last holder of the reference on this request > correct? The only places using __fuse_put_request() would be where we are in a > path where the caller already holds a reference on the request. Since this is > async it may not be the case right? > > If it is safe then it's just confusing and warrants a comment. > There is always a refcount still held on the request by fuse_simple_request() when this is called. I'll add a comment about this. I also just noticed that I use fuse_put_request() at the end of this function, I'll change that to __fuse_put_request() as well. > > + return; > > + } > > + if (test_bit(FR_INTERRUPTED, &req->flags)) > > + list_del_init(&req->intr_entry); > > + > > + fpq = req->fpq; > > + spin_unlock(&fiq->lock); > > + > > + if (fpq) { > > + spin_lock(&fpq->lock); > > + if (!fpq->connected && (!test_bit(FR_PRIVATE, &req->flags))) { > ^^ > > You don't need the extra () there. > > > + spin_unlock(&fpq->lock); > > + /* > > + * Connection is being aborted. The abort will release > > + * the refcount on the request > > + */ > > + req->out.h.error = -ECONNABORTED; > > + return; > > + } > > + if (req->out.h.error == -ESTALE) { > > + /* > > + * Device is being released. The fuse_dev_release call > > + * will drop the refcount on the request > > + */ > > + spin_unlock(&fpq->lock); > > + return; > > + } > > + if (!test_bit(FR_PRIVATE, &req->flags)) > > + list_del_init(&req->list); > > + spin_unlock(&fpq->lock); > > + } > > + > > + req->out.h.error = -ETIME; > > + > > + if (test_bit(FR_ASYNC, &req->flags)) > > + req->args->end(req->fm, req->args, req->out.h.error); > > + > > + fuse_put_request(req); > > +} > > Just a general styling thing, we have two different states for requests here, > PENDING and !PENDING correct? I think it may be better to do something like > > if (test_bit(FR_PENDING, &req->flags)) > timeout_pending_req(); > else > timeout_inflight_req(); > > and then in timeout_pending_req() you do > > spin_lock(&fiq->lock); > if (!test_bit(FR_PENDING, &req->flags)) { > spin_unlock(&fiq_lock); > timeout_inflight_req(); > return; > } > > This will keep the two different state cleanup functions separate and a little > cleaner to grok. > Thanks for the suggestion, I will make this change for v2. > > + > > static int queue_interrupt(struct fuse_req *req) > > { > > struct fuse_iqueue *fiq = &req->fm->fc->iq; > > @@ -361,6 +424,62 @@ static int queue_interrupt(struct fuse_req *req) > > return 0; > > } > > > > +enum wait_type { > > + WAIT_TYPE_INTERRUPTIBLE, > > + WAIT_TYPE_KILLABLE, > > + WAIT_TYPE_NONINTERRUPTIBLE, > > +}; > > + > > +static int fuse_wait_event_interruptible_timeout(struct fuse_req *req) > > +{ > > + struct fuse_conn *fc = req->fm->fc; > > + > > + return wait_event_interruptible_timeout(req->waitq, > > + test_bit(FR_FINISHED, > > + &req->flags), > > + fc->daemon_timeout); > > +} > > +ALLOW_ERROR_INJECTION(fuse_wait_event_interruptible_timeout, ERRNO); > > + > > +static int wait_answer_timeout(struct fuse_req *req, enum wait_type type) > > +{ > > + struct fuse_conn *fc = req->fm->fc; > > + int err; > > + > > +wait_answer_start: > > + if (type == WAIT_TYPE_INTERRUPTIBLE) > > + err = fuse_wait_event_interruptible_timeout(req); > > + else if (type == WAIT_TYPE_KILLABLE) > > + err = wait_event_killable_timeout(req->waitq, > > + test_bit(FR_FINISHED, &req->flags), > > + fc->daemon_timeout); > > + > > + else if (type == WAIT_TYPE_NONINTERRUPTIBLE) > > + err = wait_event_timeout(req->waitq, test_bit(FR_FINISHED, &req->flags), > > + fc->daemon_timeout); > > + else > > + WARN_ON(1); > > This will leak some random value for err, so initialize err to something that > will be dealt with, like -EINVAL; > > > + > > + /* request was answered */ > > + if (err > 0) > > + return 0; > > + > > + /* request was not answered in time */ > > + if (err == 0) { > > + if (test_and_set_bit(FR_PROCESSING, &req->flags)) > > + /* request reply is being processed by kernel right now. > > + * we should wait for the answer. > > + */ > > Format for multiline comments is > > /* > * blah > * blah > */ > > and since this is a 1 line if statement put it above the if statement. > > > + goto wait_answer_start; > > + > > + fuse_request_timeout(req); > > + return 0; > > + } > > + > > + /* else request was interrupted */ > > + return err; > > +} > > + > > static void request_wait_answer(struct fuse_req *req) > > { > > struct fuse_conn *fc = req->fm->fc; > > @@ -369,8 +488,11 @@ static void request_wait_answer(struct fuse_req *req) > > > > if (!fc->no_interrupt) { > > /* Any signal may interrupt this */ > > - err = wait_event_interruptible(req->waitq, > > - test_bit(FR_FINISHED, &req->flags)); > > + if (fc->daemon_timeout) > > + err = wait_answer_timeout(req, WAIT_TYPE_INTERRUPTIBLE); > > + else > > + err = wait_event_interruptible(req->waitq, > > + test_bit(FR_FINISHED, &req->flags)); > > if (!err) > > return; > > > > @@ -383,8 +505,11 @@ static void request_wait_answer(struct fuse_req *req) > > > > if (!test_bit(FR_FORCE, &req->flags)) { > > /* Only fatal signals may interrupt this */ > > - err = wait_event_killable(req->waitq, > > - test_bit(FR_FINISHED, &req->flags)); > > + if (fc->daemon_timeout) > > + err = wait_answer_timeout(req, WAIT_TYPE_KILLABLE); > > + else > > + err = wait_event_killable(req->waitq, > > + test_bit(FR_FINISHED, &req->flags)); > > if (!err) > > return; > > > > @@ -404,7 +529,10 @@ static void request_wait_answer(struct fuse_req *req) > > * Either request is already in userspace, or it was forced. > > * Wait it out. > > */ > > - wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags)); > > + if (fc->daemon_timeout) > > + wait_answer_timeout(req, WAIT_TYPE_NONINTERRUPTIBLE); > > + else > > + wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags)); > > } > > > > static void __fuse_request_send(struct fuse_req *req) > > @@ -1268,6 +1396,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, > > req = list_entry(fiq->pending.next, struct fuse_req, list); > > clear_bit(FR_PENDING, &req->flags); > > list_del_init(&req->list); > > + /* Acquire a reference since fuse_request_timeout may also be executing */ > > + __fuse_get_request(req); > > + req->fpq = fpq; > > spin_unlock(&fiq->lock); > > > > There's a race here with completion. If we timeout a request right here, we can > end up sending that same request below. I don't think there's any way around this unless we take the fpq lock while we do the fuse_copy stuff, because even if we check the FR_PROCESSING bit, the timeout handler could start running after the fpq lock is released when we do the fuse_copy calls. In my point of view, I don't think this race matters. We could have this situation happen on a regular timed-out request. For example, we send out a request to userspace and if the server takes too long to reply, the request is cancelled/invalidated in the kernel but the server will still see the request anyways. WDYT? > > You are going to need to check > > test_bit(FR_PROCESSING) > > after you take the fpq->lock just below here to make sure you didn't race with > the timeout handler and time the request out already. > > > args = req->args; > > @@ -1280,6 +1411,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, > > if (args->opcode == FUSE_SETXATTR) > > req->out.h.error = -E2BIG; > > fuse_request_end(req); > > + fuse_put_request(req); > > goto restart; > > } > > spin_lock(&fpq->lock); > > @@ -1316,13 +1448,23 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, > > } > > hash = fuse_req_hash(req->in.h.unique); > > list_move_tail(&req->list, &fpq->processing[hash]); > > - __fuse_get_request(req); > > set_bit(FR_SENT, &req->flags); > > spin_unlock(&fpq->lock); > > /* matches barrier in request_wait_answer() */ > > smp_mb__after_atomic(); > > if (test_bit(FR_INTERRUPTED, &req->flags)) > > queue_interrupt(req); > > + > > + /* Check if request timed out */ > > + if (test_bit(FR_PROCESSING, &req->flags)) { > > + spin_lock(&fpq->lock); > > + if (!test_bit(FR_PRIVATE, &req->flags)) > > + list_del_init(&req->list); > > + spin_unlock(&fpq->lock); > > + fuse_put_request(req); > > + return -ETIME; > > + } > > This isn't quite right, we could have FR_PROCESSING set because we completed the > request before we got here. If you put a schedule_timeout(HZ); right above this > you could easily see where a request gets completed by userspace, but now you've > fimed it out. Oh I see, you're talking about the race where a request is replied to immediately after the fuse_copy calls and before this gets called. Then when we get here, we can't differentiate between whether FR_PROCESSING was set by the timeout handler or the reply handler. I think the simplest way around this is to check if the FR_SENT flag was cleared (the reply handler clears it while holding the fpq lock where FR_PROCESSING gets set and the timeout handler doesn't clear it), then return -ETIME if it wasn't and 0 if it was. I'll add this into v2. > > Additionally if we have FR_PROCESSING set from the timeout, shouldn't this > cleanup have been done already? I don't understand why we need to handle this > here, we should just return and whoever is waiting on the request will get the > error. In most cases yes, but there is a race where the timeout handler may finish executing before the logic in dev_do_read that adds the request to the fpq lists. If this happens, the freed request will remain on the list. i think this race currently exists prior to these changes as well - in the case you mentioned above where the request may have been completed right after the fuse_copy calls in dev_do_read and before dev_do_read moves the request to the fpq lists. We would get into the same situation with a freed request still on the list. Thanks, Joanne > > Thanks, > > Josef