Re: MDS: what do FILE_SHARED and FILE_EXCLUSIVE caps actually represent?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 12 Mar 2019, Gregory Farnum wrote:
> On Tue, Mar 12, 2019 at 9:46 AM Jeff Layton <jlayton@xxxxxxxxxxxxxxx> wrote:
> >
> > I have more questions about MDS caps. The File (F*) caps in cephfs are
> > very granular, such that it's not clear what extra ability each one
> > grants with respect to the others. Here's the list:
> >
> > #define CEPH_CAP_FILE_SHARED   (CEPH_CAP_GSHARED   << CEPH_CAP_SFILE)
> > #define CEPH_CAP_FILE_EXCL     (CEPH_CAP_GEXCL     << CEPH_CAP_SFILE)
> > #define CEPH_CAP_FILE_CACHE    (CEPH_CAP_GCACHE    << CEPH_CAP_SFILE)
> > #define CEPH_CAP_FILE_RD       (CEPH_CAP_GRD       << CEPH_CAP_SFILE)
> > #define CEPH_CAP_FILE_WR       (CEPH_CAP_GWR       << CEPH_CAP_SFILE)
> > #define CEPH_CAP_FILE_BUFFER   (CEPH_CAP_GBUFFER   << CEPH_CAP_SFILE)
> > #define CEPH_CAP_FILE_WREXTEND (CEPH_CAP_GWREXTEND << CEPH_CAP_SFILE)
> > #define CEPH_CAP_FILE_LAZYIO   (CEPH_CAP_GLAZYIO   << CEPH_CAP_SFILE)
> >
> > My questions:
> >
> > 1) Why do we have SHARED and CACHE (and similarly EXCL and BUFFER)?
> > Shouldn't one imply the other? Under what circumstances would you issue
> > them independently of one another?
> 
> CACHE and BUFFER are special kinds of caps. They apply only to the
> FILE inode cap, and they refer to whether you can cache or buffer data
> extents of the file. SHARED and EXCL refer to inode attributes and
> apply to every kind of cap.
> They certainly move together often (perhaps always?), but they are
> distinct because CACHE and BUFFER are "extra".
> 
> > 2) My understanding (quite possibly wrong) is that RD and WR are really
> > there cover the validity of the file layout. Should SHARED/EXCL imply
> > those as well?
> 
> Ah, that's not right. RD and WR also apply only to File caps, and mean
> that you can read and write the file data from the OSDs. They *do not*
> map on to the same things as SHARED and EXCL do! You can easily have
> multiple writers who each have Fsrw caps, meaning they can read and
> write to the data and have shared permissions on the file metadata.
> They don't have exclusive caps because there are multiple active
> writers changing state. This means, among other things, that they
> can't extend the file size without an MDS request; issuing one would
> temporarily revoke those Fs caps but I think not the Frw ones.

nit: I think Fs wouldn't be issued in combination with Frw, since it would 
imply that mtime and size are accurate, and at least the former can never 
be true with Frw cap are issued to other clients.

I think the way to think about this is Fs and Fx apply to the inode 
attributes (mtime and size), while Fcrwb apply to file operations and 
cached/buffered data.  They move in unison since they are obviously 
related and both driven by the same filelock state in the MDS.

> > 3) Is WREXTEND deprecated? The client seems to ignore it.
> 
> Uh, I'm not familiar with that one, so I'm going with "yes".
> In fact sha1 ca6c8a7a1956691837948c38ff7c5b7c45f2a051 states
> "CEPH_CAP_FILE_WREXTEND is an unused bit, reuse it for
> CEPH_STAT_RSTAT".
>
> > 4) LAZYIO is there, but its semantics are not documented at all, AFAICT.
> > I get that it's supposed to relax ceph's caching semantics. Under what
> > circumstances _should_ the client invalidate cached dentries and inodes
> > when this is set? IOW, what are the lazyio "rules" ?
> 
> I don't think these are very well specified, especially around dentry
> and inode caches. IIRC LAZYIO was created in anticipation of the
> proposed Linux LAZYIO extensions, but in CephFS applies mostly to
> cached file data to allow conflicting caches and buffers in situations
> where clients handle their own consistency bounds (ie, HPC
> applications where each client gets its own range of a file to play in
> and doesn't touch that of anybody else.)

IIRC the Fl lazyio bit is basically analogous to Fc and Fb in that 
it means the client is allowed to buffer/cache, but only on those 
file handles that have enabled lazyio via the ioctl.  It's only 
requested/wanted if there is at least one open file handle with lazyio 
enabled, and the MDS is a bit liberal in giving it to clients than Fc or 
Fb because its users have opted to manage their client cache consistency 
(writeback and invalidation) themselves.

sage



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux