On Wed, Mar 13, 2019 at 4:12 AM Jeff Layton <jlayton@xxxxxxxxxxxxxxx> wrote: > > On Tue, 2019-03-12 at 18:08 +0000, Sage Weil wrote: > > On Tue, 12 Mar 2019, Gregory Farnum wrote: > > > On Tue, Mar 12, 2019 at 9:46 AM Jeff Layton <jlayton@xxxxxxxxxxxxxxx> wrote: > > > > > > > > I have more questions about MDS caps. The File (F*) caps in cephfs are > > > > very granular, such that it's not clear what extra ability each one > > > > grants with respect to the others. Here's the list: > > > > > > > > #define CEPH_CAP_FILE_SHARED (CEPH_CAP_GSHARED << CEPH_CAP_SFILE) > > > > #define CEPH_CAP_FILE_EXCL (CEPH_CAP_GEXCL << CEPH_CAP_SFILE) > > > > #define CEPH_CAP_FILE_CACHE (CEPH_CAP_GCACHE << CEPH_CAP_SFILE) > > > > #define CEPH_CAP_FILE_RD (CEPH_CAP_GRD << CEPH_CAP_SFILE) > > > > #define CEPH_CAP_FILE_WR (CEPH_CAP_GWR << CEPH_CAP_SFILE) > > > > #define CEPH_CAP_FILE_BUFFER (CEPH_CAP_GBUFFER << CEPH_CAP_SFILE) > > > > #define CEPH_CAP_FILE_WREXTEND (CEPH_CAP_GWREXTEND << CEPH_CAP_SFILE) > > > > #define CEPH_CAP_FILE_LAZYIO (CEPH_CAP_GLAZYIO << CEPH_CAP_SFILE) > > > > > > > > My questions: > > > > > > > > 1) Why do we have SHARED and CACHE (and similarly EXCL and BUFFER)? > > > > Shouldn't one imply the other? Under what circumstances would you issue > > > > them independently of one another? > > > > > > CACHE and BUFFER are special kinds of caps. They apply only to the > > > FILE inode cap, and they refer to whether you can cache or buffer data > > > extents of the file. SHARED and EXCL refer to inode attributes and > > > apply to every kind of cap. > > > They certainly move together often (perhaps always?), but they are > > > distinct because CACHE and BUFFER are "extra". > > > > > > > 2) My understanding (quite possibly wrong) is that RD and WR are really > > > > there cover the validity of the file layout. Should SHARED/EXCL imply > > > > those as well? > > > > > > Ah, that's not right. RD and WR also apply only to File caps, and mean > > > that you can read and write the file data from the OSDs. > > Ok. > > Just so I'm clear -- in what situations do we revoke Fr or Fw caps? > Looking at the code it appears that that is done when the MDS is in > recovery, and maybe when a client that doesn't handle inline data > encounters such an inode? > MDS revokes Frw when truncating file. MDS revokes Fw when handle getattr (file size) > > They *do not* > > > map on to the same things as SHARED and EXCL do! You can easily have > > > multiple writers who each have Fsrw caps, meaning they can read and > > > write to the data and have shared permissions on the file metadata. > > > They don't have exclusive caps because there are multiple active > > > writers changing state. This means, among other things, that they > > > can't extend the file size without an MDS request; issuing one would > > > temporarily revoke those Fs caps but I think not the Frw ones. > > > > nit: I think Fs wouldn't be issued in combination with Frw, since it would > > imply that mtime and size are accurate, and at least the former can never > > be true with Frw cap are issued to other clients. > > > > I think the way to think about this is Fs and Fx apply to the inode > > attributes (mtime and size), while Fcrwb apply to file operations and > > cached/buffered data. They move in unison since they are obviously > > related and both driven by the same filelock state in the MDS. > > > > Ok, that makes a bit more sense, thanks. > > > > > 3) Is WREXTEND deprecated? The client seems to ignore it. > > > > > > Uh, I'm not familiar with that one, so I'm going with "yes". > > > In fact sha1 ca6c8a7a1956691837948c38ff7c5b7c45f2a051 states > > > "CEPH_CAP_FILE_WREXTEND is an unused bit, reuse it for > > > CEPH_STAT_RSTAT". > > > > > > > 4) LAZYIO is there, but its semantics are not documented at all, AFAICT. > > > > I get that it's supposed to relax ceph's caching semantics. Under what > > > > circumstances _should_ the client invalidate cached dentries and inodes > > > > when this is set? IOW, what are the lazyio "rules" ? > > > > > > I don't think these are very well specified, especially around dentry > > > and inode caches. IIRC LAZYIO was created in anticipation of the > > > proposed Linux LAZYIO extensions, but in CephFS applies mostly to > > > cached file data to allow conflicting caches and buffers in situations > > > where clients handle their own consistency bounds (ie, HPC > > > applications where each client gets its own range of a file to play in > > > and doesn't touch that of anybody else.) > > > > IIRC the Fl lazyio bit is basically analogous to Fc and Fb in that > > it means the client is allowed to buffer/cache, but only on those > > file handles that have enabled lazyio via the ioctl. It's only > > requested/wanted if there is at least one open file handle with lazyio > > enabled, and the MDS is a bit liberal in giving it to clients than Fc or > > Fb because its users have opted to manage their client cache consistency > > (writeback and invalidation) themselves. > > > > Ok, so this is basically giving the application control over the > caching? How does an application linked against libcephfs invalidate the > cache when it detects that something has changed? > > Thanks again for the info! > -- > Jeff Layton <jlayton@xxxxxxxxxxxxxxx> >