Inquiry regarding FUSE filesystem behavior with mount bind in docker

Julian Sun <sunjunchao2870@xxxxxxxxx> · Fri, 10 Jan 2025 17:31:50 +0800

Dear Respected Maintainers,

I hope this email finds you well.

Recently, one of our customers encountered the following issue:
A directory from a FUSE filesystem(a local union fs) on the host
machine was shared inside a Docker container using mount bind. After
updating the FUSE filesystem (by killing the existing FUSE process,
replacing the binary, and starting a new FUSE process), any operation
on the shared directory inside the Docker container failed with the
error 'Transport endpoint is not connected'. The only solution in this
case was to restart the container.

Upon further reproduction and debugging, I found that the issue arises
due to the check in fuse_get_req() where the condition if
(!fc->connected) causes an ENOTCONN error code to be returned. I
understand this behavior, and it is reasonable in principle. However,
I still want to ask: is there a way to avoid this error?

I believe there are two aspects to this issue:

1. Handling the mount bind of a directory when the FUSE process is
killed and restarted.
2. The behavior related to the mount namespace within Docker.

Currently, I’ve implemented a rather hacky solution to address this
issue by re-associating the old mount bind points with the new dentry
of the updated FUSE filesystem. Specifically, the steps I followed
are:

1. During the initial mount bind, I record the path of the shared
directory in struct mountpoint.
2. After mounting the new FUSE filesystem, I traverse all struct mount
instances which were linked to staled super_block on  the host using
an ioctl command. I then locate the new dentry for the shared
directory using the recorded mnt_mp->path, and update the
corresponding mnt.mnt_sb and mnt.mnt_root accordingly.
3. During unmount, ensure that the old mnt_sb and mnt_root are also
released properly.

I am aware that this approach is very hacky and likely prone to bugs.
However, it appears to work for now, and I've been testing the above
case for 2 days, no issues have been reported yet.

My questions are as follows:

Is this issue something that is worth addressing upstream? If so, I
would be happy to work on it and submit patches to the mainline
kernel.
If not, are there alternative ways to avoid this problem in a more
stable and reliable manner?
IMO, solving this issue further demonstrates the flexibility and high
availability of user-space file systems.

Any feedback or guidance would be highly appreciated. I look forward
to hearing your thoughts.

Thank you for your time and support.

Best regards,
-- 
Julian Sun <sunjunchao2870@xxxxxxxxx>