On Tue, Mar 12, 2019 at 1:10 PM Jeff Layton <jlayton@xxxxxxxxxxxxxxx> wrote: > > On Tue, 2019-03-12 at 18:08 +0000, Sage Weil wrote: > > On Tue, 12 Mar 2019, Gregory Farnum wrote: > > > On Tue, Mar 12, 2019 at 9:46 AM Jeff Layton <jlayton@xxxxxxxxxxxxxxx> wrote: > > > > > > > > I have more questions about MDS caps. The File (F*) caps in cephfs are > > > > very granular, such that it's not clear what extra ability each one > > > > grants with respect to the others. Here's the list: > > > > > > > > #define CEPH_CAP_FILE_SHARED (CEPH_CAP_GSHARED << CEPH_CAP_SFILE) > > > > #define CEPH_CAP_FILE_EXCL (CEPH_CAP_GEXCL << CEPH_CAP_SFILE) > > > > #define CEPH_CAP_FILE_CACHE (CEPH_CAP_GCACHE << CEPH_CAP_SFILE) > > > > #define CEPH_CAP_FILE_RD (CEPH_CAP_GRD << CEPH_CAP_SFILE) > > > > #define CEPH_CAP_FILE_WR (CEPH_CAP_GWR << CEPH_CAP_SFILE) > > > > #define CEPH_CAP_FILE_BUFFER (CEPH_CAP_GBUFFER << CEPH_CAP_SFILE) > > > > #define CEPH_CAP_FILE_WREXTEND (CEPH_CAP_GWREXTEND << CEPH_CAP_SFILE) > > > > #define CEPH_CAP_FILE_LAZYIO (CEPH_CAP_GLAZYIO << CEPH_CAP_SFILE) > > > > > > > > My questions: > > > > > > > > 1) Why do we have SHARED and CACHE (and similarly EXCL and BUFFER)? > > > > Shouldn't one imply the other? Under what circumstances would you issue > > > > them independently of one another? > > > > > > CACHE and BUFFER are special kinds of caps. They apply only to the > > > FILE inode cap, and they refer to whether you can cache or buffer data > > > extents of the file. SHARED and EXCL refer to inode attributes and > > > apply to every kind of cap. > > > They certainly move together often (perhaps always?), but they are > > > distinct because CACHE and BUFFER are "extra". > > > > > > > 2) My understanding (quite possibly wrong) is that RD and WR are really > > > > there cover the validity of the file layout. Should SHARED/EXCL imply > > > > those as well? > > > > > > Ah, that's not right. RD and WR also apply only to File caps, and mean > > > that you can read and write the file data from the OSDs. > > Ok. > > Just so I'm clear -- in what situations do we revoke Fr or Fw caps? > Looking at the code it appears that that is done when the MDS is in > recovery, and maybe when a client that doesn't handle inline data > encounters such an inode? I'm not going to pretend to remember the scenarios, but in theory it could happen because we decide a client isn't using the caps and we want to give somebody else a CACHE or BUFFER that conflicts with other people doing reads and writes. (In practice, I think that will only happen if a client voluntarily drops the Frw from its wanted set.) IIRC, In the normal course of events these are caps that clients might not have yet (because they were just looking at the directory contents and these caps are only given out on request [and maybe file create?]) and add later, or drop when a file is closed and goes idle. > > > > 4) LAZYIO is there, but its semantics are not documented at all, AFAICT. > > > > I get that it's supposed to relax ceph's caching semantics. Under what > > > > circumstances _should_ the client invalidate cached dentries and inodes > > > > when this is set? IOW, what are the lazyio "rules" ? > > > > > > I don't think these are very well specified, especially around dentry > > > and inode caches. IIRC LAZYIO was created in anticipation of the > > > proposed Linux LAZYIO extensions, but in CephFS applies mostly to > > > cached file data to allow conflicting caches and buffers in situations > > > where clients handle their own consistency bounds (ie, HPC > > > applications where each client gets its own range of a file to play in > > > and doesn't touch that of anybody else.) > > > > IIRC the Fl lazyio bit is basically analogous to Fc and Fb in that > > it means the client is allowed to buffer/cache, but only on those > > file handles that have enabled lazyio via the ioctl. It's only > > requested/wanted if there is at least one open file handle with lazyio > > enabled, and the MDS is a bit liberal in giving it to clients than Fc or > > Fb because its users have opted to manage their client cache consistency > > (writeback and invalidation) themselves. > > > > Ok, so this is basically giving the application control over the > caching? How does an application linked against libcephfs invalidate the > cache when it detects that something has changed? I believe the kernel client has ioctls that can handle this (the same ones that get used with MMAP for flushing and refreshing the page cache?). It's possible the user space client doesn't — I'm not sure lazyio was ever completely implemented since the interfaces weren't actually put in the kernel. I do see lazyio_propogate() and lazyio_synchronize() in the Client, but those both appear to do an fsync(); the second tries _release() which sometimes does _invalidate_inode_cache() but I'm not sure if it's working (since it only does that when the client isn't using Fc, which I suspect it always is in lazyio?) -Greg