On Thu, Jul 18, 2024 at 9:58 AM Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > > On Thu, Jul 18, 2024 at 7:44 AM Josef Bacik <josef@xxxxxxxxxxxxxx> wrote: > > > > On Wed, Jul 17, 2024 at 02:34:58PM -0700, Joanne Koong wrote: > ... > > > --- > > > fs/fuse/dev.c | 177 ++++++++++++++++++++++++++++++++++++++++++++--- > > > fs/fuse/fuse_i.h | 12 ++++ > > > fs/fuse/inode.c | 7 ++ > > > 3 files changed, 188 insertions(+), 8 deletions(-) > > > > > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > > > index 9eb191b5c4de..7dd7b244951b 100644 > > > --- a/fs/fuse/dev.c > > > +++ b/fs/fuse/dev.c > > > @@ -331,6 +331,69 @@ void fuse_request_end(struct fuse_req *req) > > > } > > > EXPORT_SYMBOL_GPL(fuse_request_end); > > > > > > +/* fuse_request_end for requests that timeout */ > > > +static void fuse_request_timeout(struct fuse_req *req) > > > +{ > > > + struct fuse_conn *fc = req->fm->fc; > > > + struct fuse_iqueue *fiq = &fc->iq; > > > + struct fuse_pqueue *fpq; > > > + > > > + spin_lock(&fiq->lock); > > > + if (!fiq->connected) { > > > + spin_unlock(&fiq->lock); > > > + /* > > > + * Connection is being aborted. The abort will release > > > + * the refcount on the request > > > + */ > > > + req->out.h.error = -ECONNABORTED; > > > + return; > > > + } > > > + if (test_bit(FR_PENDING, &req->flags)) { > > > + /* Request is not yet in userspace, bail out */ > > > + list_del(&req->list); > > > + spin_unlock(&fiq->lock); > > > + req->out.h.error = -ETIME; > > > + __fuse_put_request(req); > > > > Why is this safe? We could be the last holder of the reference on this request > > correct? The only places using __fuse_put_request() would be where we are in a > > path where the caller already holds a reference on the request. Since this is > > async it may not be the case right? > > > > If it is safe then it's just confusing and warrants a comment. > > > > There is always a refcount still held on the request by > fuse_simple_request() when this is called. I'll add a comment about > this. > I also just noticed that I use fuse_put_request() at the end of this > function, I'll change that to __fuse_put_request() as well. > > > > + return; > > > + } > > > + if (test_bit(FR_INTERRUPTED, &req->flags)) > > > + list_del_init(&req->intr_entry); > > > + > > > + fpq = req->fpq; > > > + spin_unlock(&fiq->lock); > > > + > > > + if (fpq) { > > > + spin_lock(&fpq->lock); > > > + if (!fpq->connected && (!test_bit(FR_PRIVATE, &req->flags))) { > > ^^ > > > > You don't need the extra () there. > > > > > + spin_unlock(&fpq->lock); > > > + /* > > > + * Connection is being aborted. The abort will release > > > + * the refcount on the request > > > + */ > > > + req->out.h.error = -ECONNABORTED; > > > + return; > > > + } > > > + if (req->out.h.error == -ESTALE) { > > > + /* > > > + * Device is being released. The fuse_dev_release call > > > + * will drop the refcount on the request > > > + */ > > > + spin_unlock(&fpq->lock); > > > + return; > > > + } > > > + if (!test_bit(FR_PRIVATE, &req->flags)) > > > + list_del_init(&req->list); > > > + spin_unlock(&fpq->lock); > > > + } > > > + > > > + req->out.h.error = -ETIME; > > > + > > > + if (test_bit(FR_ASYNC, &req->flags)) > > > + req->args->end(req->fm, req->args, req->out.h.error); > > > + > > > + fuse_put_request(req); > > > +} > > > > Just a general styling thing, we have two different states for requests here, > > PENDING and !PENDING correct? I think it may be better to do something like > > > > if (test_bit(FR_PENDING, &req->flags)) > > timeout_pending_req(); > > else > > timeout_inflight_req(); > > > > and then in timeout_pending_req() you do > > > > spin_lock(&fiq->lock); > > if (!test_bit(FR_PENDING, &req->flags)) { > > spin_unlock(&fiq_lock); > > timeout_inflight_req(); > > return; > > } > > > > This will keep the two different state cleanup functions separate and a little > > cleaner to grok. > > > Thanks for the suggestion, I will make this change for v2. > > > + > > > static int queue_interrupt(struct fuse_req *req) > > > { > > > struct fuse_iqueue *fiq = &req->fm->fc->iq; > > > @@ -361,6 +424,62 @@ static int queue_interrupt(struct fuse_req *req) > > > return 0; > > > } > > > > > > +enum wait_type { > > > + WAIT_TYPE_INTERRUPTIBLE, > > > + WAIT_TYPE_KILLABLE, > > > + WAIT_TYPE_NONINTERRUPTIBLE, > > > +}; > > > + > > > +static int fuse_wait_event_interruptible_timeout(struct fuse_req *req) > > > +{ > > > + struct fuse_conn *fc = req->fm->fc; > > > + > > > + return wait_event_interruptible_timeout(req->waitq, > > > + test_bit(FR_FINISHED, > > > + &req->flags), > > > + fc->daemon_timeout); > > > +} > > > +ALLOW_ERROR_INJECTION(fuse_wait_event_interruptible_timeout, ERRNO); > > > + > > > +static int wait_answer_timeout(struct fuse_req *req, enum wait_type type) > > > +{ > > > + struct fuse_conn *fc = req->fm->fc; > > > + int err; > > > + > > > +wait_answer_start: > > > + if (type == WAIT_TYPE_INTERRUPTIBLE) > > > + err = fuse_wait_event_interruptible_timeout(req); > > > + else if (type == WAIT_TYPE_KILLABLE) > > > + err = wait_event_killable_timeout(req->waitq, > > > + test_bit(FR_FINISHED, &req->flags), > > > + fc->daemon_timeout); > > > + > > > + else if (type == WAIT_TYPE_NONINTERRUPTIBLE) > > > + err = wait_event_timeout(req->waitq, test_bit(FR_FINISHED, &req->flags), > > > + fc->daemon_timeout); > > > + else > > > + WARN_ON(1); > > > > This will leak some random value for err, so initialize err to something that > > will be dealt with, like -EINVAL; > > > > > + > > > + /* request was answered */ > > > + if (err > 0) > > > + return 0; > > > + > > > + /* request was not answered in time */ > > > + if (err == 0) { > > > + if (test_and_set_bit(FR_PROCESSING, &req->flags)) > > > + /* request reply is being processed by kernel right now. > > > + * we should wait for the answer. > > > + */ > > > > Format for multiline comments is > > > > /* > > * blah > > * blah > > */ > > > > and since this is a 1 line if statement put it above the if statement. > > > > > + goto wait_answer_start; > > > + > > > + fuse_request_timeout(req); > > > + return 0; > > > + } > > > + > > > + /* else request was interrupted */ > > > + return err; > > > +} > > > + > > > static void request_wait_answer(struct fuse_req *req) > > > { > > > struct fuse_conn *fc = req->fm->fc; > > > @@ -369,8 +488,11 @@ static void request_wait_answer(struct fuse_req *req) > > > > > > if (!fc->no_interrupt) { > > > /* Any signal may interrupt this */ > > > - err = wait_event_interruptible(req->waitq, > > > - test_bit(FR_FINISHED, &req->flags)); > > > + if (fc->daemon_timeout) > > > + err = wait_answer_timeout(req, WAIT_TYPE_INTERRUPTIBLE); > > > + else > > > + err = wait_event_interruptible(req->waitq, > > > + test_bit(FR_FINISHED, &req->flags)); > > > if (!err) > > > return; > > > > > > @@ -383,8 +505,11 @@ static void request_wait_answer(struct fuse_req *req) > > > > > > if (!test_bit(FR_FORCE, &req->flags)) { > > > /* Only fatal signals may interrupt this */ > > > - err = wait_event_killable(req->waitq, > > > - test_bit(FR_FINISHED, &req->flags)); > > > + if (fc->daemon_timeout) > > > + err = wait_answer_timeout(req, WAIT_TYPE_KILLABLE); > > > + else > > > + err = wait_event_killable(req->waitq, > > > + test_bit(FR_FINISHED, &req->flags)); > > > if (!err) > > > return; > > > > > > @@ -404,7 +529,10 @@ static void request_wait_answer(struct fuse_req *req) > > > * Either request is already in userspace, or it was forced. > > > * Wait it out. > > > */ > > > - wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags)); > > > + if (fc->daemon_timeout) > > > + wait_answer_timeout(req, WAIT_TYPE_NONINTERRUPTIBLE); > > > + else > > > + wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags)); > > > } > > > > > > static void __fuse_request_send(struct fuse_req *req) > > > @@ -1268,6 +1396,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, > > > req = list_entry(fiq->pending.next, struct fuse_req, list); > > > clear_bit(FR_PENDING, &req->flags); > > > list_del_init(&req->list); > > > + /* Acquire a reference since fuse_request_timeout may also be executing */ > > > + __fuse_get_request(req); > > > + req->fpq = fpq; > > > spin_unlock(&fiq->lock); > > > > > > > There's a race here with completion. If we timeout a request right here, we can > > end up sending that same request below. > > I don't think there's any way around this unless we take the fpq lock > while we do the fuse_copy stuff, because even if we check the > FR_PROCESSING bit, the timeout handler could start running after the > fpq lock is released when we do the fuse_copy calls. > > In my point of view, I don't think this race matters. We could have > this situation happen on a regular timed-out request. For example, we > send out a request to userspace and if the server takes too long to > reply, the request is cancelled/invalidated in the kernel but the > server will still see the request anyways. > > WDYT? > > > > > You are going to need to check > > > > test_bit(FR_PROCESSING) > > > > after you take the fpq->lock just below here to make sure you didn't race with > > the timeout handler and time the request out already. > > > > > args = req->args; > > > @@ -1280,6 +1411,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, > > > if (args->opcode == FUSE_SETXATTR) > > > req->out.h.error = -E2BIG; > > > fuse_request_end(req); > > > + fuse_put_request(req); > > > goto restart; > > > } > > > spin_lock(&fpq->lock); > > > @@ -1316,13 +1448,23 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, > > > } > > > hash = fuse_req_hash(req->in.h.unique); > > > list_move_tail(&req->list, &fpq->processing[hash]); > > > - __fuse_get_request(req); > > > set_bit(FR_SENT, &req->flags); > > > spin_unlock(&fpq->lock); > > > /* matches barrier in request_wait_answer() */ > > > smp_mb__after_atomic(); > > > if (test_bit(FR_INTERRUPTED, &req->flags)) > > > queue_interrupt(req); > > > + > > > + /* Check if request timed out */ > > > + if (test_bit(FR_PROCESSING, &req->flags)) { > > > + spin_lock(&fpq->lock); > > > + if (!test_bit(FR_PRIVATE, &req->flags)) > > > + list_del_init(&req->list); > > > + spin_unlock(&fpq->lock); > > > + fuse_put_request(req); > > > + return -ETIME; > > > + } > > > > This isn't quite right, we could have FR_PROCESSING set because we completed the > > request before we got here. If you put a schedule_timeout(HZ); right above this > > you could easily see where a request gets completed by userspace, but now you've > > fimed it out. > > Oh I see, you're talking about the race where a request is replied to > immediately after the fuse_copy calls and before this gets called. > Then when we get here, we can't differentiate between whether > FR_PROCESSING was set by the timeout handler or the reply handler. > > I think the simplest way around this is to check if the FR_SENT flag > was cleared (the reply handler clears it while holding the fpq lock > where FR_PROCESSING gets set and the timeout handler doesn't clear > it), then return -ETIME if it wasn't and 0 if it was. > > I'll add this into v2. > > > > > Additionally if we have FR_PROCESSING set from the timeout, shouldn't this > > cleanup have been done already? I don't understand why we need to handle this > > here, we should just return and whoever is waiting on the request will get the > > error. > > In most cases yes, but there is a race where the timeout handler may > finish executing before the logic in dev_do_read that adds the request > to the fpq lists. If this happens, the freed request will remain on > the list. > > i think this race currently exists prior to these changes as well - amendment: this statement is not accurate. In the existing code, there is no race between the reply handler and dev_do_read, because the reply handler can only handle the request once the request is on the fpq->processing list. (We do need to account for this race with the timeout handler though since the timeout handler can get triggered at any time). Also, while working on v2 I noticed we also need to handle races between the timeout handler and requests being re-sent (fuse_resend()). This will get addressed in v2. > in the case you mentioned above where the request may have been > completed right after the fuse_copy calls in dev_do_read and before > dev_do_read moves the request to the fpq lists. We would get into the > same situation with a freed request still on the list. > > > Thanks, > Joanne > > > > Thanks, > > > > Josef