在 2023/11/2 下午11:17, Miklos Szeredi 写道:
On Tue, 31 Oct 2023 at 15:41, 赵晨 <winters.zc@xxxxxxxxxxxx> wrote:
After the fuse daemon crashes, the fuse mount point becomes inaccessible.
In some production environments, a watchdog daemon is used to preserve
the FUSE connection's file descriptor (fd). When the FUSE daemon crashes,
a new FUSE daemon is restarted and takes over the fd from the watchdog
daemon, allowing it to continue providing services.
However, if any inflight requests are lost during the crash, the user
process becomes stuck as it does not receive any replies.
To resolve this issue, this patchset introduces two sysfs APIs that enable
flushing or resending these pending requests for recovery. The flush
operation ends the pending request and returns an error to the
application, allowing the stuck user process to recover. While returning
an error may not be suitable for all scenarios, the resend API can be used
to resend the these pending requests.
When using the resend API, FUSE daemon needs to ensure proper recording
and avoidance of processing duplicate non-idempotent requests to prevent
potential consistency issues.
Do we need both the resend and the flush APIs? I think the flush
functionality can easily be implemented with the resend API, no?
Thanks,
Miklos
Thank you for your response, Miklos.
Yes, it is possible to implement flush functionality using the resend
API. However, flush offers additional convenience.
For instance, some fuse daemons that allow discarding requests to
prevent user process io-hang but do not want to handle duplicate
requests, may require extra effort in persistent record if using resend.
In such cases, using the flush API would provide more convenience.
So, based on my understanding, resend is adequate, but flush can offer
more convenience. I would like to inquire about your preference
regarding the two APIs. Should I do some verification and remove the
flush API, and then resend this patchset?
Best Regards,
Zhao Chen