Re: ceph-fuse remount issues

严正 <zyan@xxxxxxxxxx> · Thu, 26 Feb 2015 16:28:46 +0800

> 在 2015年2月20日，06:23，John Spray <john.spray@xxxxxxxxxx> 写道：
> 
> 
> Background: a while ago, we found (#10277) that existing cache expiration mechanism wasn't working with latest kernels.  We used to invalidate the top level dentries, which caused fuse to invalidate everything, but an implementation detail in fuse caused it to start ignoring our repeated invalidate calls, so this doesn't work any more.  To persuade fuse to dirty its entire metadata cache, Zheng added in a system() call to "mount -o remount" after we expire things from our client side cache.

Change of d_invalidate() implementation breaks our old cache expiration mechanism. When invalidating a denty, d_invalidate() also walks the dentry subtree and try pruning any unused descendant dentries. Our old cache expiration mechanism replies on this
to prune unused dentries. We invalidate the top level dentries, d_invalidate() try pruning unused dentries underneath these top level dentries. Prior to 3.18 kernel, d_invalidate() can fail if the dentry is used by some one. Implementation of d_invalidate() change
in 3.18 kernel,  d_invalidate() always successes and unhash the dentry even if it’s still in use. This behavior changes make us not be able to use d_invalidate() at will. One known bad consequence is getcwd() system call return -EINVAL after process’ working directory gets invalidated.

The cephfs kernel client has no such issue because it maintains its own per-session cap list. When it receives cache pressure message from MDS, it can iterate the list and prune unused caps.

> 
> However, this was a bit of a hack and has created problems:
> * You can't call mount -o remount unless you're root, so we are less flexible than we used to be (#10542)
> * While the remount is happening, unmounts sporadically fail and the fuse process can become unresponsive to SIGKILL (#10916)
> 
> The first issue was maybe an acceptable compromise, but the second issue is just painful, and it seems like we might not have seen the last of the knock on effects -- upstream maintainers certainly aren't expecting filesystems to remount themselves quite so frequently.
> 
> We probably have an opportunity to get something upstream in fuse to support a direct call to trigger the invalidation we want, if we can work out what that should look like.  Thoughts?
> 
> John

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html