Re: [RFC 0/2] fuse: introduce fuse server recovery mechanism

Christian Brauner <brauner@xxxxxxxxxx> · Tue, 28 May 2024 10:38:31 +0200

On Fri, May 24, 2024 at 02:40:28PM +0800, Jingbo Xu wrote:
> Background
> ==========
> The fd of '/dev/fuse' serves as a message transmission channel between
> FUSE filesystem (kernel space) and fuse server (user space). Once the
> fd gets closed (intentionally or unintentionally), the FUSE filesystem
> gets aborted, and any attempt of filesystem access gets -ECONNABORTED
> error until the FUSE filesystem finally umounted.
> 
> It is one of the requisites in production environment to provide
> uninterruptible filesystem service.  The most straightforward way, and
> maybe the most widely used way, is that make another dedicated user
> daemon (similar to systemd fdstore) keep the device fd open.  When the
> fuse daemon recovers from a crash, it can retrieve the device fd from the
> fdstore daemon through socket takeover (Unix domain socket) method [1]
> or pidfd_getfd() syscall [2].  In this way, as long as the fdstore
> daemon doesn't exit, the FUSE filesystem won't get aborted once the fuse
> daemon crashes, though the filesystem service may hang there for a while
> when the fuse daemon gets restarted and has not been completely
> recovered yet.
> 
> This picture indeed works and has been deployed in our internal
> production environment until the following issues are encountered:
> 
> 1. The fdstore daemon may be killed by mistake, in which case the FUSE
> filesystem gets aborted and irrecoverable.

That's only a problem if you use the fdstore of the per-user instance.
The main fdstore is part of PID 1 and you can't kill that. So really,
systemd needs to hand the fds from the per-user instance to the main
fdstore.

> 2. In scenarios of containerized deployment, the fuse daemon is deployed
> in a container POD, and a dedicated fdstore daemon needs to be deployed
> for each fuse daemon.  The fdstore daemon could consume a amount of
> resources (e.g. memory footprint), which is not conducive to the dense
> container deployment.
> 
> 3. Each fuse daemon implementation needs to implement its own fdstore
> daemon.  If we implement the fuse recovery mechanism on the kernel side,
> all fuse daemon implementations could reuse this mechanism.

You can just the global fdstore. That is a design limitation not an
inherent limitation.