Re: [RFC] What if client fuse process crash?

Niels de Vos <ndevos@xxxxxxxxxx> · Tue, 6 Aug 2019 11:35:04 +0200

On Tue, Aug 06, 2019 at 04:47:46PM +0800, Changwei Ge wrote:
> Hi Niels,
> 
> On 2019/8/6 3:50 下午, Niels de Vos wrote:
> > On Tue, Aug 06, 2019 at 03:14:46PM +0800, Changwei Ge wrote:
> > > On 2019/8/6 2:57 下午, Ravishankar N wrote:
> > > > On 06/08/19 11:44 AM, Changwei Ge wrote:
> > > > > Hi Ravishankar,
> > > > > 
> > > > > 
> > > > > Thanks for your share, it's very useful to me.
> > > > > 
> > > > > I am setting up a glusterfs storage cluster recently and the
> > > > > umount/mount recovering process bothered me.
> > > > Hi Changwei,
> > > > Why are you needing to do frequent remounts? If your gluster fuse client
> > > > is crashing frequently, that should be investigated and fixed. If you
> > > > have a reproducer, please raise a bug with all the details like the
> > > > glusterfs version, core files and log files.
> > > 
> > > Hi Ravi,
> > > 
> > > Actually, glusterfs client fuse process ran well in my environment. But
> > > high-availability and fault-tolerance are also my big concerns.
> > > 
> > > So I killed the fuse process to see what would happen. AFAIK, userspace
> > > processes are likely to be killed or crashed somehow, which is not under our
> > > control. :-(
> > > 
> > > Another scenario is *software upgrade*. Since we have to upgrade glusterfs
> > > client version in order to enrich features and fix bugs.  It will be
> > > friendly to applications if the upgrade is transparent.
> > As open files have a state associated with them, and the state is lost
> > when the fuse process exits. Restarting the fuse process will then need
> > to restore the state of the open files (and caches, and more). This is
> > not trivial and I do not think any work on this end has been done yet.
> 
> 
> True, tons of work have to be done if we want to restore all files' state to
> make restarted fuse process continue to work as never be restarted.
> 
> I suppose two methods might be feasible:
> 
>     One is to try to fetch file state from kernel to restore files' state
> into fuse process,
> 
>     the other one is to duplicate those  state to a standby process or just
> use Linux shared memory mechanism?

Restoring the state from the kernel would be my preference. That is the
view of the storage that the application has as well. But it may not be
possible to recover all details that the xlators track. Storing those in
shared memory (or file backed persistent storage) might not even be
sufficient. With upgrades it is possible to get new features in existing
xlators that would need to refresh their state to get the extensions. It
is even possible that new xlators get added, and those will need to get
the state of the files too.

I think, in the end it would boil down to getting the state from the
kernel, and revalidating each inode through the mountpoint to the
server. This is also what happens on graph-switches (new volume layout
or options pushed from the server to client). To get this to work, it
needs to be possible for a FUSE service to re-attach itself to a
mountpoint where the previous FUSE process detached. I do not think this
is possible at the moment, it will require extensions in the FUSE kernel
module (and then re-attaching a new state to all inodes).

> > Some users take an alternative route. Mounted filesystems have indeed
> > issues with online updating. So, maybe you do not need to mount the
> > filesystem at all. Depending on the need of your applications, using
> > glusterfs-coreutils instead of a FUSE (or NFS) mount might be an option
> > for you. The short living processes connect to the Gluster Volume when
> > needed, and do not keep a connection open. Updating userspace tools is
> > much simpler than long running processes that are hooked into the
> > kernel.
> > 
> > See https://github.com/gluster/glusterfs-coreutils for details.
> 
> 
> That's helpful, but I think then some POSIX file operations can't be
> performed anymore.

Indeed, glusterfs-coreutils is more of an object storage interface than
a POSIX complaint filesystem.

Niels

> 
> 
> Thanks,
> 
> Changwei
> 
> 
> > 
> > HTH,
> > Niels
> > 
> > 
> > > 
> > > Thanks,
> > > 
> > > Changwei
> > > 
> > > 
> > > > Regards,
> > > > Ravi
> > > > > 
> > > > > I happened to find some patches[1] from internet aiming to address
> > > > > such a problem but no idea why they were not managed to merge into
> > > > > glusterfs mainline.
> > > > > 
> > > > > Do you know why?
> > > > > 
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > Changwei
> > > > > 
> > > > > 
> > > > > [1]:
> > > > > 
> > > > > https://review.gluster.org/#/c/glusterfs/+/16843/
> > > > > 
> > > > > https://github.com/gluster/glusterfs/issues/242
> > > > > 
> > > > > 
> > > > > On 2019/8/6 1:12 下午, Ravishankar N wrote:
> > > > > > On 05/08/19 3:31 PM, Changwei Ge wrote:
> > > > > > > Hi list,
> > > > > > > 
> > > > > > > If somehow, glusterfs client fuse process dies. All
> > > > > > > subsequent file operations will be failed with error 'no
> > > > > > > connection'.
> > > > > > > 
> > > > > > > I am curious if the only way to recover is umount and mount again?
> > > > > > Yes, this is pretty much the case with all fuse based file
> > > > > > systems. You can use -o auto_unmount
> > > > > > (https://review.gluster.org/#/c/17230/) to automatically cleanup
> > > > > > and not having to manually unmount.
> > > > > > > If so, that means all processes working on top of glusterfs
> > > > > > > have to close files, which sometimes is hard to be
> > > > > > > acceptable.
> > > > > > There is
> > > > > > https://research.cs.wisc.edu/wind/Publications/refuse-eurosys11.html,
> > > > > > which claims to provide a framework for transparent failovers. I
> > > > > > can't find any publicly available code though.
> > > > > > 
> > > > > > Regards,
> > > > > > Ravi
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > 
> > > > > > > Changwei
> > > > > > > 
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > 
> > > > > > > Community Meeting Calendar:
> > > > > > > 
> > > > > > > APAC Schedule -
> > > > > > > Every 2nd and 4th Tuesday at 11:30 AM IST
> > > > > > > Bridge: https://bluejeans.com/836554017
> > > > > > > 
> > > > > > > NA/EMEA Schedule -
> > > > > > > Every 1st and 3rd Tuesday at 01:00 PM EDT
> > > > > > > Bridge: https://bluejeans.com/486278655
> > > > > > > 
> > > > > > > Gluster-devel mailing list
> > > > > > > Gluster-devel@xxxxxxxxxxx
> > > > > > > https://lists.gluster.org/mailman/listinfo/gluster-devel
> > > > > > > 
> > > _______________________________________________
> > > 
> > > Community Meeting Calendar:
> > > 
> > > APAC Schedule -
> > > Every 2nd and 4th Tuesday at 11:30 AM IST
> > > Bridge: https://bluejeans.com/836554017
> > > 
> > > NA/EMEA Schedule -
> > > Every 1st and 3rd Tuesday at 01:00 PM EDT
> > > Bridge: https://bluejeans.com/486278655
> > > 
> > > Gluster-devel mailing list
> > > Gluster-devel@xxxxxxxxxxx
> > > https://lists.gluster.org/mailman/listinfo/gluster-devel
> > > 
_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel