Re: MDS: what do FILE_SHARED and FILE_EXCLUSIVE caps actually represent?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 13, 2019 at 4:12 AM Jeff Layton <jlayton@xxxxxxxxxxxxxxx> wrote:
>
> On Tue, 2019-03-12 at 18:08 +0000, Sage Weil wrote:
> > On Tue, 12 Mar 2019, Gregory Farnum wrote:
> > > On Tue, Mar 12, 2019 at 9:46 AM Jeff Layton <jlayton@xxxxxxxxxxxxxxx> wrote:
> > > >
> > > > I have more questions about MDS caps. The File (F*) caps in cephfs are
> > > > very granular, such that it's not clear what extra ability each one
> > > > grants with respect to the others. Here's the list:
> > > >
> > > > #define CEPH_CAP_FILE_SHARED   (CEPH_CAP_GSHARED   << CEPH_CAP_SFILE)
> > > > #define CEPH_CAP_FILE_EXCL     (CEPH_CAP_GEXCL     << CEPH_CAP_SFILE)
> > > > #define CEPH_CAP_FILE_CACHE    (CEPH_CAP_GCACHE    << CEPH_CAP_SFILE)
> > > > #define CEPH_CAP_FILE_RD       (CEPH_CAP_GRD       << CEPH_CAP_SFILE)
> > > > #define CEPH_CAP_FILE_WR       (CEPH_CAP_GWR       << CEPH_CAP_SFILE)
> > > > #define CEPH_CAP_FILE_BUFFER   (CEPH_CAP_GBUFFER   << CEPH_CAP_SFILE)
> > > > #define CEPH_CAP_FILE_WREXTEND (CEPH_CAP_GWREXTEND << CEPH_CAP_SFILE)
> > > > #define CEPH_CAP_FILE_LAZYIO   (CEPH_CAP_GLAZYIO   << CEPH_CAP_SFILE)
> > > >
> > > > My questions:
> > > >
> > > > 1) Why do we have SHARED and CACHE (and similarly EXCL and BUFFER)?
> > > > Shouldn't one imply the other? Under what circumstances would you issue
> > > > them independently of one another?
> > >
> > > CACHE and BUFFER are special kinds of caps. They apply only to the
> > > FILE inode cap, and they refer to whether you can cache or buffer data
> > > extents of the file. SHARED and EXCL refer to inode attributes and
> > > apply to every kind of cap.
> > > They certainly move together often (perhaps always?), but they are
> > > distinct because CACHE and BUFFER are "extra".
> > >
> > > > 2) My understanding (quite possibly wrong) is that RD and WR are really
> > > > there cover the validity of the file layout. Should SHARED/EXCL imply
> > > > those as well?
> > >
> > > Ah, that's not right. RD and WR also apply only to File caps, and mean
> > > that you can read and write the file data from the OSDs.
>
> Ok.
>
> Just so I'm clear -- in what situations do we revoke Fr or Fw caps?
> Looking at the code it appears that that is done when the MDS is in
> recovery, and maybe when a client that doesn't handle inline data
> encounters such an inode?
>

MDS revokes Frw when truncating file. MDS revokes Fw when handle
getattr (file size)

> >  They *do not*
> > > map on to the same things as SHARED and EXCL do! You can easily have
> > > multiple writers who each have Fsrw caps, meaning they can read and
> > > write to the data and have shared permissions on the file metadata.
> > > They don't have exclusive caps because there are multiple active
> > > writers changing state. This means, among other things, that they
> > > can't extend the file size without an MDS request; issuing one would
> > > temporarily revoke those Fs caps but I think not the Frw ones.
> >
> > nit: I think Fs wouldn't be issued in combination with Frw, since it would
> > imply that mtime and size are accurate, and at least the former can never
> > be true with Frw cap are issued to other clients.
> >
> > I think the way to think about this is Fs and Fx apply to the inode
> > attributes (mtime and size), while Fcrwb apply to file operations and
> > cached/buffered data.  They move in unison since they are obviously
> > related and both driven by the same filelock state in the MDS.
> >
>
> Ok, that makes a bit more sense, thanks.
>
> > > > 3) Is WREXTEND deprecated? The client seems to ignore it.
> > >
> > > Uh, I'm not familiar with that one, so I'm going with "yes".
> > > In fact sha1 ca6c8a7a1956691837948c38ff7c5b7c45f2a051 states
> > > "CEPH_CAP_FILE_WREXTEND is an unused bit, reuse it for
> > > CEPH_STAT_RSTAT".
> > >
> > > > 4) LAZYIO is there, but its semantics are not documented at all, AFAICT.
> > > > I get that it's supposed to relax ceph's caching semantics. Under what
> > > > circumstances _should_ the client invalidate cached dentries and inodes
> > > > when this is set? IOW, what are the lazyio "rules" ?
> > >
> > > I don't think these are very well specified, especially around dentry
> > > and inode caches. IIRC LAZYIO was created in anticipation of the
> > > proposed Linux LAZYIO extensions, but in CephFS applies mostly to
> > > cached file data to allow conflicting caches and buffers in situations
> > > where clients handle their own consistency bounds (ie, HPC
> > > applications where each client gets its own range of a file to play in
> > > and doesn't touch that of anybody else.)
> >
> > IIRC the Fl lazyio bit is basically analogous to Fc and Fb in that
> > it means the client is allowed to buffer/cache, but only on those
> > file handles that have enabled lazyio via the ioctl.  It's only
> > requested/wanted if there is at least one open file handle with lazyio
> > enabled, and the MDS is a bit liberal in giving it to clients than Fc or
> > Fb because its users have opted to manage their client cache consistency
> > (writeback and invalidation) themselves.
> >
>
> Ok, so this is basically giving the application control over the
> caching? How does an application linked against libcephfs invalidate the
> cache when it detects that something has changed?
>
> Thanks again for the info!
> --
> Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
>



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux