Re: Fwd: NAS solution for CephFS

Jeff Layton <jlayton@xxxxxxxxxxxxxxx> · Mon, 18 Feb 2019 10:47:27 -0500

On Mon, 2019-02-18 at 16:40 +0100, Paul Emmerich wrote:
> > A call into libcephfs from ganesha to retrieve cached attributes is
> > mostly just in-memory copies within the same process, so any performance
> > overhead there is pretty minimal. If we need to go to the network to get
> > the attributes, then that was a case where the cache should have been
> > invalidated anyway, and we avoid having to check the validity of the
> > cache.
> 
> I've benchmarked a ~15% performance difference in IOPS between cache
> expiration time of 0 and 10 when running fio on a single file from a
> single client.
> 
> 

NFS iops? I'd guess more READ ops in particular? Is that with a
FSAL_CEPH backend?

> 
> > 
> > > On Thu, Feb 14, 2019 at 9:04 PM Jeff Layton <jlayton@xxxxxxxxxxxxxxx> wrote:
> > > > On Thu, 2019-02-14 at 20:57 +0800, Marvin Zhang wrote:
> > > > > Here is the copy from https://tools.ietf.org/html/rfc7530#page-40
> > > > > Will Client query 'change' attribute every time before reading to know
> > > > > if the data has been changed?
> > > > > 
> > > > >       +-----------------+----+------------+-----+-------------------+
> > > > >       | Name            | ID | Data Type  | Acc | Defined in        |
> > > > >       +-----------------+----+------------+-----+-------------------+
> > > > >       | supported_attrs | 0  | bitmap4    | R   | Section 5.8.1.1   |
> > > > >       | type            | 1  | nfs_ftype4 | R   | Section 5.8.1.2   |
> > > > >       | fh_expire_type  | 2  | uint32_t   | R   | Section 5.8.1.3   |
> > > > >       | change          | 3  | changeid4  | R   | Section 5.8.1.4   |
> > > > >       | size            | 4  | uint64_t   | R W | Section 5.8.1.5   |
> > > > >       | link_support    | 5  | bool       | R   | Section 5.8.1.6   |
> > > > >       | symlink_support | 6  | bool       | R   | Section 5.8.1.7   |
> > > > >       | named_attr      | 7  | bool       | R   | Section 5.8.1.8   |
> > > > >       | fsid            | 8  | fsid4      | R   | Section 5.8.1.9   |
> > > > >       | unique_handles  | 9  | bool       | R   | Section 5.8.1.10  |
> > > > >       | lease_time      | 10 | nfs_lease4 | R   | Section 5.8.1.11  |
> > > > >       | rdattr_error    | 11 | nfsstat4   | R   | Section 5.8.1.12  |
> > > > >       | filehandle      | 19 | nfs_fh4    | R   | Section 5.8.1.13  |
> > > > >       +-----------------+----+------------+-----+-------------------+
> > > > > 
> > > > 
> > > > Not every time -- only when the cache needs revalidation.
> > > > 
> > > > In the absence of a delegation, that happens on a timeout (see the
> > > > acregmin/acregmax settings in nfs(5)), though things like opens and file
> > > > locking events also affect when the client revalidates.
> > > > 
> > > > When the v4 client does revalidate the cache, it relies heavily on NFSv4
> > > > change attribute. Cephfs's change attribute is cluster-coherent too, so
> > > > if the client does revalidate it should see changes made on other
> > > > servers.
> > > > 
> > > > > On Thu, Feb 14, 2019 at 8:29 PM Jeff Layton <jlayton@xxxxxxxxxxxxxxx> wrote:
> > > > > > On Thu, 2019-02-14 at 19:49 +0800, Marvin Zhang wrote:
> > > > > > > Hi Jeff,
> > > > > > > Another question is about Client Caching when disabling delegation.
> > > > > > > I set breakpoint on nfs4_op_read, which is OP_READ process function in
> > > > > > > nfs-ganesha. Then I read a file, I found that it will hit only once on
> > > > > > > the first time, which means latter reading operation on this file will
> > > > > > > not trigger OP_READ. It will read the data from client side cache. Is
> > > > > > > it right?
> > > > > > 
> > > > > > Yes. In the absence of a delegation, the client will periodically query
> > > > > > for the inode attributes, and will serve reads from the cache if it
> > > > > > looks like the file hasn't changed.
> > > > > > 
> > > > > > > I also checked the nfs client code in linux kernel. Only
> > > > > > > cache_validity is NFS_INO_INVALID_DATA, it will send OP_READ again,
> > > > > > > like this:
> > > > > > >     if (nfsi->cache_validity & NFS_INO_INVALID_DATA) {
> > > > > > >         ret = nfs_invalidate_mapping(inode, mapping);
> > > > > > >     }
> > > > > > > This about this senario, client1 connect ganesha1 and client2 connect
> > > > > > > ganesha2. I read /1.txt on client1 and client1 will cache the data.
> > > > > > > Then I modify this file on client2. At that time, how client1 know the
> > > > > > > file is modifed and how it will add NFS_INO_INVALID_DATA into
> > > > > > > cache_validity?
> > > > > > 
> > > > > > Once you modify the code on client2, ganesha2 will request the necessary
> > > > > > caps from the ceph MDS, and client1 will have its caps revoked. It'll
> > > > > > then make the change.
> > > > > > 
> > > > > > When client1 reads again it will issue a GETATTR against the file [1].
> > > > > > ganesha1 will then request caps to do the getattr, which will end up
> > > > > > revoking ganesha2's caps. client1 will then see the change in attributes
> > > > > > (the change attribute and mtime, most likely) and will invalidate the
> > > > > > mapping, causing it do reissue a READ on the wire.
> > > > > > 
> > > > > > [1]: There may be a window of time after you change the file on client2
> > > > > > where client1 doesn't see it. That's due to the fact that inode
> > > > > > attributes on the client are only revalidated after a timeout. You may
> > > > > > want to read over the DATA AND METADATA COHERENCE section of nfs(5) to
> > > > > > make sure you understand how the NFS client validates its caches.
> > > > > > 
> > > > > > Cheers,
> > > > > > --
> > > > > > Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
> > > > > > 
> > > > 
> > > > --
> > > > Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
> > > > 
> > 
> > --
> > Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
> > 
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Jeff Layton <jlayton@xxxxxxxxxxxxxxx>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com