On Mon, 2019-02-18 at 17:02 +0100, Paul Emmerich wrote: > > > I've benchmarked a ~15% performance difference in IOPS between cache > > > expiration time of 0 and 10 when running fio on a single file from a > > > single client. > > > > > > > > > > NFS iops? I'd guess more READ ops in particular? Is that with a > > FSAL_CEPH backend? > > Yes. But that take that with a grain of salt, that was just a quick > and dirty test of a very specific scenario that may or may not be > relevant. > > Sure. If the NFS iops go up when you remove a layer of caching, then that suggests that you had a situation where the cache likely should have been invalidated, but wasn't. Basically, you may be sacrificing cache coherency for performance. The bigger question I have is whether the ganesha mdcache provides any performance gain when the attributes are already cached in the libcephfs layer. If we did want to start using the mdcache, then we'd almost certainly want to invalidate that cache when libcephfs gives up caps. I just don't see how the extra layer of caching provides much value in that situation. > > > > > > > On Thu, Feb 14, 2019 at 9:04 PM Jeff Layton <jlayton@xxxxxxxxxxxxxxx> wrote: > > > > > > On Thu, 2019-02-14 at 20:57 +0800, Marvin Zhang wrote: > > > > > > > Here is the copy from https://tools.ietf.org/html/rfc7530#page-40 > > > > > > > Will Client query 'change' attribute every time before reading to know > > > > > > > if the data has been changed? > > > > > > > > > > > > > > +-----------------+----+------------+-----+-------------------+ > > > > > > > | Name | ID | Data Type | Acc | Defined in | > > > > > > > +-----------------+----+------------+-----+-------------------+ > > > > > > > | supported_attrs | 0 | bitmap4 | R | Section 5.8.1.1 | > > > > > > > | type | 1 | nfs_ftype4 | R | Section 5.8.1.2 | > > > > > > > | fh_expire_type | 2 | uint32_t | R | Section 5.8.1.3 | > > > > > > > | change | 3 | changeid4 | R | Section 5.8.1.4 | > > > > > > > | size | 4 | uint64_t | R W | Section 5.8.1.5 | > > > > > > > | link_support | 5 | bool | R | Section 5.8.1.6 | > > > > > > > | symlink_support | 6 | bool | R | Section 5.8.1.7 | > > > > > > > | named_attr | 7 | bool | R | Section 5.8.1.8 | > > > > > > > | fsid | 8 | fsid4 | R | Section 5.8.1.9 | > > > > > > > | unique_handles | 9 | bool | R | Section 5.8.1.10 | > > > > > > > | lease_time | 10 | nfs_lease4 | R | Section 5.8.1.11 | > > > > > > > | rdattr_error | 11 | nfsstat4 | R | Section 5.8.1.12 | > > > > > > > | filehandle | 19 | nfs_fh4 | R | Section 5.8.1.13 | > > > > > > > +-----------------+----+------------+-----+-------------------+ > > > > > > > > > > > > > > > > > > > Not every time -- only when the cache needs revalidation. > > > > > > > > > > > > In the absence of a delegation, that happens on a timeout (see the > > > > > > acregmin/acregmax settings in nfs(5)), though things like opens and file > > > > > > locking events also affect when the client revalidates. > > > > > > > > > > > > When the v4 client does revalidate the cache, it relies heavily on NFSv4 > > > > > > change attribute. Cephfs's change attribute is cluster-coherent too, so > > > > > > if the client does revalidate it should see changes made on other > > > > > > servers. > > > > > > > > > > > > > On Thu, Feb 14, 2019 at 8:29 PM Jeff Layton <jlayton@xxxxxxxxxxxxxxx> wrote: > > > > > > > > On Thu, 2019-02-14 at 19:49 +0800, Marvin Zhang wrote: > > > > > > > > > Hi Jeff, > > > > > > > > > Another question is about Client Caching when disabling delegation. > > > > > > > > > I set breakpoint on nfs4_op_read, which is OP_READ process function in > > > > > > > > > nfs-ganesha. Then I read a file, I found that it will hit only once on > > > > > > > > > the first time, which means latter reading operation on this file will > > > > > > > > > not trigger OP_READ. It will read the data from client side cache. Is > > > > > > > > > it right? > > > > > > > > > > > > > > > > Yes. In the absence of a delegation, the client will periodically query > > > > > > > > for the inode attributes, and will serve reads from the cache if it > > > > > > > > looks like the file hasn't changed. > > > > > > > > > > > > > > > > > I also checked the nfs client code in linux kernel. Only > > > > > > > > > cache_validity is NFS_INO_INVALID_DATA, it will send OP_READ again, > > > > > > > > > like this: > > > > > > > > > if (nfsi->cache_validity & NFS_INO_INVALID_DATA) { > > > > > > > > > ret = nfs_invalidate_mapping(inode, mapping); > > > > > > > > > } > > > > > > > > > This about this senario, client1 connect ganesha1 and client2 connect > > > > > > > > > ganesha2. I read /1.txt on client1 and client1 will cache the data. > > > > > > > > > Then I modify this file on client2. At that time, how client1 know the > > > > > > > > > file is modifed and how it will add NFS_INO_INVALID_DATA into > > > > > > > > > cache_validity? > > > > > > > > > > > > > > > > Once you modify the code on client2, ganesha2 will request the necessary > > > > > > > > caps from the ceph MDS, and client1 will have its caps revoked. It'll > > > > > > > > then make the change. > > > > > > > > > > > > > > > > When client1 reads again it will issue a GETATTR against the file [1]. > > > > > > > > ganesha1 will then request caps to do the getattr, which will end up > > > > > > > > revoking ganesha2's caps. client1 will then see the change in attributes > > > > > > > > (the change attribute and mtime, most likely) and will invalidate the > > > > > > > > mapping, causing it do reissue a READ on the wire. > > > > > > > > > > > > > > > > [1]: There may be a window of time after you change the file on client2 > > > > > > > > where client1 doesn't see it. That's due to the fact that inode > > > > > > > > attributes on the client are only revalidated after a timeout. You may > > > > > > > > want to read over the DATA AND METADATA COHERENCE section of nfs(5) to > > > > > > > > make sure you understand how the NFS client validates its caches. > > > > > > > > > > > > > > > > Cheers, > > > > > > > > -- > > > > > > > > Jeff Layton <jlayton@xxxxxxxxxxxxxxx> > > > > > > > > > > > > > > > > > > > > -- > > > > > > Jeff Layton <jlayton@xxxxxxxxxxxxxxx> > > > > > > > > > > > > > > -- > > > > Jeff Layton <jlayton@xxxxxxxxxxxxxxx> > > > > > > > > _______________________________________________ > > > > ceph-users mailing list > > > > ceph-users@xxxxxxxxxxxxxx > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- > > Jeff Layton <jlayton@xxxxxxxxxxxxxxx> > > -- Jeff Layton <jlayton@xxxxxxxxxxxxxxx> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com