Re: [PATCH v5 0/2] ceph: metrics for opened files, pinned caps and opened inodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 2020-09-13 at 12:40 +0200, Ilya Dryomov wrote:
> On Fri, Sep 11, 2020 at 9:46 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > On Fri, 2020-09-11 at 07:49 -0400, Jeff Layton wrote:
> > > On Fri, 2020-09-11 at 11:43 +0800, Xiubo Li wrote:
> > > > On 2020/9/10 20:13, Jeff Layton wrote:
> > > > > On Thu, 2020-09-10 at 08:00 +0200, Ilya Dryomov wrote:
> > > > > > On Thu, Sep 10, 2020 at 2:59 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
> > > > > > > On 2020/9/10 4:34, Ilya Dryomov wrote:
> > > > > > > > On Thu, Sep 3, 2020 at 4:22 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
> > > > > > > > > On 2020/9/3 22:18, Jeff Layton wrote:
> > > > > > > > > > On Thu, 2020-09-03 at 09:01 -0400, xiubli@xxxxxxxxxx wrote:
> > > > > > > > > > > From: Xiubo Li <xiubli@xxxxxxxxxx>
> > > > > > > > > > > 
> > > > > > > > > > > Changed in V5:
> > > > > > > > > > > - Remove mdsc parsing helpers except the ceph_sb_to_mdsc()
> > > > > > > > > > > - Remove the is_opened member.
> > > > > > > > > > > 
> > > > > > > > > > > Changed in V4:
> > > > > > > > > > > - A small fix about the total_inodes.
> > > > > > > > > > > 
> > > > > > > > > > > Changed in V3:
> > > > > > > > > > > - Resend for V2 just forgot one patch, which is adding some helpers
> > > > > > > > > > > support to simplify the code.
> > > > > > > > > > > 
> > > > > > > > > > > Changed in V2:
> > > > > > > > > > > - Add number of inodes that have opened files.
> > > > > > > > > > > - Remove the dir metrics and fold into files.
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Xiubo Li (2):
> > > > > > > > > > >      ceph: add ceph_sb_to_mdsc helper support to parse the mdsc
> > > > > > > > > > >      ceph: metrics for opened files, pinned caps and opened inodes
> > > > > > > > > > > 
> > > > > > > > > > >     fs/ceph/caps.c    | 41 +++++++++++++++++++++++++++++++++++++----
> > > > > > > > > > >     fs/ceph/debugfs.c | 11 +++++++++++
> > > > > > > > > > >     fs/ceph/dir.c     | 20 +++++++-------------
> > > > > > > > > > >     fs/ceph/file.c    | 13 ++++++-------
> > > > > > > > > > >     fs/ceph/inode.c   | 11 ++++++++---
> > > > > > > > > > >     fs/ceph/locks.c   |  2 +-
> > > > > > > > > > >     fs/ceph/metric.c  | 14 ++++++++++++++
> > > > > > > > > > >     fs/ceph/metric.h  |  7 +++++++
> > > > > > > > > > >     fs/ceph/quota.c   | 10 +++++-----
> > > > > > > > > > >     fs/ceph/snap.c    |  2 +-
> > > > > > > > > > >     fs/ceph/super.h   |  6 ++++++
> > > > > > > > > > >     11 files changed, 103 insertions(+), 34 deletions(-)
> > > > > > > > > > > 
> > > > > > > > > > Looks good. I went ahead and merge this into testing.
> > > > > > > > > > 
> > > > > > > > > > Small merge conflict in quota.c, which I guess is probably due to not
> > > > > > > > > > basing this on testing branch. I also dropped what looks like an
> > > > > > > > > > unrelated hunk in the second patch.
> > > > > > > > > > 
> > > > > > > > > > In the future, if you can be sure that patches you post apply cleanly to
> > > > > > > > > > testing branch then that would make things easier.
> > > > > > > > > Okay, will do it.
> > > > > > > > Hi Xiubo,
> > > > > > > > 
> > > > > > > > There is a problem with lifetimes here.  mdsc isn't guaranteed to exist
> > > > > > > > when ->free_inode() is called.  This can lead to crashes on a NULL mdsc
> > > > > > > > in ceph_free_inode() in case of e.g. "umount -f".  I know it was Jeff's
> > > > > > > > suggestion to move the decrement of total_inodes into ceph_free_inode(),
> > > > > > > > but it doesn't look like it can be easily deferred past ->evict_inode().
> > > > > > > Okay, I will take a look.
> > > > > > Given that it's just a counter which we don't care about if the
> > > > > > mount is going away, some form of "if (mdsc)" check might do, but
> > > > > > need to make sure that it covers possible races, if any.
> > > > > > 
> > > > > Good catch, Ilya.
> > > > > 
> > > > > What may be best is to move the increment out of ceph_alloc_inode and
> > > > > instead put it in ceph_set_ino_cb. Then the decrement can go back into
> > > > > ceph_evict_inode.
> > > > 
> > > > Hi Jeff, Ilya
> > > > 
> > > > Checked the code, it seems in the ceph_evict_inode() we will also hit
> > > > the same issue .
> > > > 
> > > > With the '-f' options when umounting, it will skip the inodes whose
> > > > i_count ref > 0. And then free the fsc/mdsc in ceph. And later the
> > > > iput_final() will call the ceph_evict_inode() and then ceph_free_inode().
> > > > 
> > > > Could we just check if !!(sb->s_flags & SB_ACTIVE) is false will we skip
> > > > the counting ?
> > > > 
> > > 
> > > Note that umount -f (MNT_FORCE) just means that ceph_umount_begin is
> > > called before unmounting.
> > > 
> > > If what you're saying it true, then we have bigger problems.
> > > ceph_evict_inode does this today when ci->i_snap_realm is set:
> > > 
> > >     struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc;
> > > 
> > > ...and then goes on to use that mdsc pointer.
> > > 
> > 
> > Now that I look, I don't think that this is a problem. ceph_kill_sb
> > calls generic_shutdown_super, which calls evict_inodes before the client
> > is torn down. That should ensure that the mdsc is still good when evict
> > is called.
> > 
> > We will need to move the increment into the iget5_locked "set" function.
> > Maybe we can squash the patch below into yours?
> > 
> > ----------------------8<---------------------------
> > 
> > ceph: use total_inodes to count hashed inodes instead of allocated ones
> > 
> > We can't guarantee that the mdsc will still be around when free_inode is
> > called, so move this into evict_inode instead. The increment then will
> > need to be moved when the thing is hashed, so move that into the set
> > callback.
> > 
> > Reported-by: Ilya Dryomov <idryomov@xxxxxxxxx>
> > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > ---
> >  fs/ceph/inode.c | 12 ++++++------
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> > index 5b9d2ff8af34..39c13fefba8a 100644
> > --- a/fs/ceph/inode.c
> > +++ b/fs/ceph/inode.c
> > @@ -42,10 +42,13 @@ static void ceph_inode_work(struct work_struct *work);
> >  static int ceph_set_ino_cb(struct inode *inode, void *data)
> >  {
> >         struct ceph_inode_info *ci = ceph_inode(inode);
> > +       struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(inode->i_sb);
> > 
> >         ci->i_vino = *(struct ceph_vino *)data;
> >         inode->i_ino = ceph_vino_to_ino_t(ci->i_vino);
> >         inode_set_iversion_raw(inode, 0);
> > +       percpu_counter_inc(&mdsc->metric.total_inodes);
> > +
> >         return 0;
> >  }
> > 
> > @@ -425,7 +428,6 @@ static int ceph_fill_fragtree(struct inode *inode,
> >   */
> >  struct inode *ceph_alloc_inode(struct super_block *sb)
> >  {
> > -       struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(sb);
> >         struct ceph_inode_info *ci;
> >         int i;
> > 
> > @@ -525,17 +527,12 @@ struct inode *ceph_alloc_inode(struct super_block *sb)
> > 
> >         ci->i_meta_err = 0;
> > 
> > -       percpu_counter_inc(&mdsc->metric.total_inodes);
> > -
> >         return &ci->vfs_inode;
> >  }
> > 
> >  void ceph_free_inode(struct inode *inode)
> >  {
> >         struct ceph_inode_info *ci = ceph_inode(inode);
> > -       struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(inode->i_sb);
> > -
> > -       percpu_counter_dec(&mdsc->metric.total_inodes);
> > 
> >         kfree(ci->i_symlink);
> >         kmem_cache_free(ceph_inode_cachep, ci);
> > @@ -544,11 +541,14 @@ void ceph_free_inode(struct inode *inode)
> >  void ceph_evict_inode(struct inode *inode)
> >  {
> >         struct ceph_inode_info *ci = ceph_inode(inode);
> > +       struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(inode->i_sb);
> 
> I'd also remove a duplicate mdsc variable declared in ci->i_snap_realm
> branch.
> 

Good catch. Fixed in testing branch.

Thanks,
-- 
Jeff Layton <jlayton@xxxxxxxxxx>




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux