On 14/12/2021 15:04, Miklos Szeredi wrote:
On Tue, 14 Dec 2021 at 13:58, Robert Vasek <rvasek01@xxxxxxxxx> wrote:
Hello fuse-devel,
I'd like to ask about the feasibility of having a reconnect feature added into the FUSE kernel module.
The idea is that when a FUSE driver disconnects (process exited due to a bug, signal, etc.), all pending and future ops for that session would wait for that driver to appear again, and then continue as normal. Waiting would be on a timer, with ENOTCONN returned in case it times out. Obviously, "continue as normal" isn't possible for all FUSE drivers, as it depends on what they do and how they implement things -- they would have to opt-in for this feature.
A kernel patch[1] as well as example userspace code[2] has already
been proposed.
[1] https://lore.kernel.org/linux-fsdevel/CAPm50a+j8UL9g3UwpRsye5e+a=M0Hy7Tf1FdfwOrUUBWMyosNg@xxxxxxxxxxxxxx/
[2] https://lore.kernel.org/linux-fsdevel/CAPm50aLuK8Smy4NzdytUPmGM1vpzokKJdRuwxawUDA4jnJg=Fg@xxxxxxxxxxxxxx/
The example recovery is not very practical, but I can see how it would
be possible to extend to a read-only fs.
There has also been some related work in the paper
"Refuse to Crash with Re-FUSE"
https://research.cs.wisc.edu/wind/Publications/refuse-eurosys11.pdf
https://eurosys2011.cs.uni-salzburg.at/pdf/eurosys2011-sundararaman.pdf
The paper gives some insight into the challenges associated with
restarting and it seems like it worked better for them than I would have
thought. Not sure if any source-code for their work is available to
reproduce their findings, though.