On Fri, May 24, 2024 at 02:40:28PM +0800, Jingbo Xu wrote: > Background > ========== > The fd of '/dev/fuse' serves as a message transmission channel between > FUSE filesystem (kernel space) and fuse server (user space). Once the > fd gets closed (intentionally or unintentionally), the FUSE filesystem > gets aborted, and any attempt of filesystem access gets -ECONNABORTED > error until the FUSE filesystem finally umounted. > > It is one of the requisites in production environment to provide > uninterruptible filesystem service. The most straightforward way, and > maybe the most widely used way, is that make another dedicated user > daemon (similar to systemd fdstore) keep the device fd open. When the > fuse daemon recovers from a crash, it can retrieve the device fd from the > fdstore daemon through socket takeover (Unix domain socket) method [1] > or pidfd_getfd() syscall [2]. In this way, as long as the fdstore > daemon doesn't exit, the FUSE filesystem won't get aborted once the fuse > daemon crashes, though the filesystem service may hang there for a while > when the fuse daemon gets restarted and has not been completely > recovered yet. > > This picture indeed works and has been deployed in our internal > production environment until the following issues are encountered: > > 1. The fdstore daemon may be killed by mistake, in which case the FUSE > filesystem gets aborted and irrecoverable. That's only a problem if you use the fdstore of the per-user instance. The main fdstore is part of PID 1 and you can't kill that. So really, systemd needs to hand the fds from the per-user instance to the main fdstore. > 2. In scenarios of containerized deployment, the fuse daemon is deployed > in a container POD, and a dedicated fdstore daemon needs to be deployed > for each fuse daemon. The fdstore daemon could consume a amount of > resources (e.g. memory footprint), which is not conducive to the dense > container deployment. > > 3. Each fuse daemon implementation needs to implement its own fdstore > daemon. If we implement the fuse recovery mechanism on the kernel side, > all fuse daemon implementations could reuse this mechanism. You can just the global fdstore. That is a design limitation not an inherent limitation.