Re: Why does the kernel client of cephfs need to keep multiple caps issued from different MDSes for the same inode?

Gregory Farnum <gfarnum@xxxxxxxxxx> · Thu, 20 Sep 2018 10:40:25 -0700

On Thu, Sep 20, 2018 at 4:22 AM, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote:
> Hi, everyone.
> I've been trying to read the source code of cephfs' kernel client, and
> I found that the kernel client use ceph_inode_info::i_caps to store
> caps issued from different MDSes. Why? Why would a single inode have
> different caps returned by different mds? It's really confusing.
> Please help me, thanks:-)

While MDSes *mostly* are independent with their metadata, they do need
to share state on the boundaries. Eg if mds.0 is responsible for
directory /foo, and mds.1 is responsible for directory /foo/bar, they
both need to know some of the state about /foo/bar (since mds.0 owns
its actual dentry and linkage into /foo). So a client may have
capabilities from mds.1 that let it work inside of /foo/bar, but need
to get another capability from mds.0 in order to move it into a
different directory /baz.

And other times, metadata migrates between the servers, and the client
needs to be able to handle that change in ownership.
-Greg