Re: [PATCH v2 2/8] ceph: add caps perf metric for each session

Jeff Layton <jlayton@xxxxxxxxxx> · Fri, 10 Jan 2020 06:51:30 -0500

On Fri, 2020-01-10 at 11:38 +0800, Xiubo Li wrote:
> On 2020/1/9 22:52, Jeff Layton wrote:
> > On Wed, 2020-01-08 at 05:41 -0500, xiubli@xxxxxxxxxx wrote:
> > > From: Xiubo Li <xiubli@xxxxxxxxxx>
> > > 
> > > This will fulfill the caps hit/miss metric for each session. When
> > > checking the "need" mask and if one cap has the subset of the "need"
> > > mask it means hit, or missed.
> > > 
> > > item          total           miss            hit
> > > -------------------------------------------------
> > > d_lease       295             0               993
> > > 
> > > session       caps            miss            hit
> > > -------------------------------------------------
> > > 0             295             107             4119
> > > 1             1               107             9
> > > 
> > > Fixes: https://tracker.ceph.com/issues/43215
> > For the record, "Fixes:" has a different meaning for kernel patches.
> > It's used to reference an earlier patch that introduced the bug that the
> > patch is fixing.
> > 
> > It's a pity that the ceph team decided to use that to reference tracker
> > tickets in their tree. For the kernel we usually use a generic "URL:"
> > tag for that.
> 
> Sure, will fix it.
> 
> [...]
> > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > > index 28ae0c134700..6ab02aab7d9c 100644
> > > --- a/fs/ceph/caps.c
> > > +++ b/fs/ceph/caps.c
> > > @@ -567,7 +567,7 @@ static void __cap_delay_cancel(struct ceph_mds_client *mdsc,
> > >   static void __check_cap_issue(struct ceph_inode_info *ci, struct ceph_cap *cap,
> > >   			      unsigned issued)
> > >   {
> > > -	unsigned had = __ceph_caps_issued(ci, NULL);
> > > +	unsigned int had = __ceph_caps_issued(ci, NULL, -1);
> > >   
> > >   	/*
> > >   	 * Each time we receive FILE_CACHE anew, we increment
> > > @@ -787,20 +787,43 @@ static int __cap_is_valid(struct ceph_cap *cap)
> > >    * out, and may be invalidated in bulk if the client session times out
> > >    * and session->s_cap_gen is bumped.
> > >    */
> > > -int __ceph_caps_issued(struct ceph_inode_info *ci, int *implemented)
> > > +int __ceph_caps_issued(struct ceph_inode_info *ci, int *implemented, int mask)
> > 
> > This seems like the wrong approach. This function returns a set of caps,
> > so it seems like the callers ought to determine whether a miss or hit
> > occurred, and whether to record it.
> 
> Currently this approach will count the hit/miss for each i_cap entity in 
> ci->i_caps, for example, if a i_cap has a subset of the requested cap 
> mask it means the i_cap hit, or the i_cap miss.
> 
> This approach will be like:
> 
> session       caps            miss            hit
> -------------------------------------------------
> 0             295             107             4119
> 1             1               107             9
> 
> The "caps" here is the total i_caps in all the ceph_inodes we have.
> 
> 
> Another approach is only when the ci->i_caps have all the requested cap 
> mask, it means hit, or miss, this is what you meant as above.
> 
> This approach will be like:
> 
> session       inodes            miss            hit
> -------------------------------------------------
> 0             295             107             4119
> 1             1               107             9
> 
> The "inodes" here is the total ceph_inodes we have.
> 
> Which one will be better ?
> 
> 

I think I wasn't clear. My objection was to adding this "mask" field to
__ceph_caps_issued and having the counting logic in there. It would be
cleaner to have the callers do that instead. __ceph_caps_issued returns
the issued caps, so the callers have all of the information they need to
increment the proper counters without having to change
__ceph_caps_issued.

> > >   {
> [...]
> > >   
> > > diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> > > index 382beb04bacb..1e1ccae8953d 100644
> > > --- a/fs/ceph/dir.c
> > > +++ b/fs/ceph/dir.c
> > > @@ -30,7 +30,7 @@
> > >   const struct dentry_operations ceph_dentry_ops;
> > >   
> > >   static bool __dentry_lease_is_valid(struct ceph_dentry_info *di);
> > > -static int __dir_lease_try_check(const struct dentry *dentry);
> > > +static int __dir_lease_try_check(const struct dentry *dentry, bool metric);
> > >   
> > AFAICT, this function is only called when trimming dentries and in
> > d_delete. I don't think we care about measuring cache hits/misses for
> > either of those cases.
> 
> Yeah, it is.
> 
> This will ignore the trimming dentries case, and will count from the 
> d_delete.
> 
> This approach will only count the cap hit/miss called from VFS layer.
> 

Why do you need this "metric" parameter here? We _know_ that we won't be
counting hits and misses in this codepath, so it doesn't seem to serve
any useful purpose.

-- 
Jeff Layton <jlayton@xxxxxxxxxx>