Re: [PATCH v4 0/2] fuse: add timeout option for requests

Joanne Koong <joannelkoong@xxxxxxxxx> · Thu, 22 Aug 2024 10:31:19 -0700

On Thu, Aug 22, 2024 at 3:52 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
>
> On Wed, 21 Aug 2024 at 23:22, Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
>
> > Without a kernel enforced timeout, the only way out of this is to
> > abort the connection. A userspace timeout wouldn't help in this case
> > with getting the server unstuck. With the kernel timeout, this forces
> > the kernel handling of the write request to proceed, whihc will drop
> > the folio lock and resume the server back to a functioning state.
> >
> > I don't think situations like this are uncommon. For example, it's not
> > obvious or clear to developers that fuse_lowlevel_notify_inval_inode()
> > shouldn't be called inside of a write handler in their server code.
>
> Documentation is definitely lacking.  In fact a simple rule is: never
> call a notification function from within a request handling function.
> Notifications are async events that should happen independently of
> handling regular operations.  Anything else is an abuse of the
> interface.
>
> >
> > For your concern about potential unintended side effects of timed out
> > requests without the server's knowledge, could you elaborate more on
> > the VFS locking example? In my mind, a request that times out is the
> > same thing as a request that behaves normally and completes with an
> > error code, but perhaps not?
>
> - user calls mknod(2) on fuse directory
> - VFS takes inode lock on parent directory
> - calls into fuse to create the file
> - fuse sends request to server
> - file creation is slow and times out in the kernel
> - fuse returns -ETIMEDOUT
> - VFS releases inode lock
> - meanwhile the server is still working on creating the file and has
> no idea that something went wrong
> - user calls the same mknod(2) again
> - same things happen as last time
> - server starts to create the file *again* knowing that the VFS takes
> care of concurrency
> - server crashes due to corruption

Thanks for the details.

For cases like these though, isn't the server already responsible for
handling errors properly to avoid potential corruption if their reply
to the request fails? In your example above, it seems like the server
would already need to have the error handling in place to roll back
the file creation if their fuse_reply_create() call returned an error
(eg -EIO if copying out args in the kernel had an issue). If the
request timed out, then the server would get back -ENOENT to their
reply.

Thanks,
Joanne

>
>
> > I think also, having some way for system admins to enforce request
> > timeouts across the board might be useful as well - for example, if a
> > malignant fuse server doesn't reply to any requests, the requests hog
> > memory until the server is killed.
>
> As I said, I'm not against enforcing a response time for fuse servers,
> as long as a timeout results in a complete abort and not just an error
> on the timed out request.
>
> Thanks,
> Miklos