Re: [PATCH v5 0/2] ceph: metrics for opened files, pinned caps and opened inodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2020-09-11 at 07:49 -0400, Jeff Layton wrote:
> On Fri, 2020-09-11 at 11:43 +0800, Xiubo Li wrote:
> > On 2020/9/10 20:13, Jeff Layton wrote:
> > > On Thu, 2020-09-10 at 08:00 +0200, Ilya Dryomov wrote:
> > > > On Thu, Sep 10, 2020 at 2:59 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
> > > > > On 2020/9/10 4:34, Ilya Dryomov wrote:
> > > > > > On Thu, Sep 3, 2020 at 4:22 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
> > > > > > > On 2020/9/3 22:18, Jeff Layton wrote:
> > > > > > > > On Thu, 2020-09-03 at 09:01 -0400, xiubli@xxxxxxxxxx wrote:
> > > > > > > > > From: Xiubo Li <xiubli@xxxxxxxxxx>
> > > > > > > > > 
> > > > > > > > > Changed in V5:
> > > > > > > > > - Remove mdsc parsing helpers except the ceph_sb_to_mdsc()
> > > > > > > > > - Remove the is_opened member.
> > > > > > > > > 
> > > > > > > > > Changed in V4:
> > > > > > > > > - A small fix about the total_inodes.
> > > > > > > > > 
> > > > > > > > > Changed in V3:
> > > > > > > > > - Resend for V2 just forgot one patch, which is adding some helpers
> > > > > > > > > support to simplify the code.
> > > > > > > > > 
> > > > > > > > > Changed in V2:
> > > > > > > > > - Add number of inodes that have opened files.
> > > > > > > > > - Remove the dir metrics and fold into files.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Xiubo Li (2):
> > > > > > > > >      ceph: add ceph_sb_to_mdsc helper support to parse the mdsc
> > > > > > > > >      ceph: metrics for opened files, pinned caps and opened inodes
> > > > > > > > > 
> > > > > > > > >     fs/ceph/caps.c    | 41 +++++++++++++++++++++++++++++++++++++----
> > > > > > > > >     fs/ceph/debugfs.c | 11 +++++++++++
> > > > > > > > >     fs/ceph/dir.c     | 20 +++++++-------------
> > > > > > > > >     fs/ceph/file.c    | 13 ++++++-------
> > > > > > > > >     fs/ceph/inode.c   | 11 ++++++++---
> > > > > > > > >     fs/ceph/locks.c   |  2 +-
> > > > > > > > >     fs/ceph/metric.c  | 14 ++++++++++++++
> > > > > > > > >     fs/ceph/metric.h  |  7 +++++++
> > > > > > > > >     fs/ceph/quota.c   | 10 +++++-----
> > > > > > > > >     fs/ceph/snap.c    |  2 +-
> > > > > > > > >     fs/ceph/super.h   |  6 ++++++
> > > > > > > > >     11 files changed, 103 insertions(+), 34 deletions(-)
> > > > > > > > > 
> > > > > > > > Looks good. I went ahead and merge this into testing.
> > > > > > > > 
> > > > > > > > Small merge conflict in quota.c, which I guess is probably due to not
> > > > > > > > basing this on testing branch. I also dropped what looks like an
> > > > > > > > unrelated hunk in the second patch.
> > > > > > > > 
> > > > > > > > In the future, if you can be sure that patches you post apply cleanly to
> > > > > > > > testing branch then that would make things easier.
> > > > > > > Okay, will do it.
> > > > > > Hi Xiubo,
> > > > > > 
> > > > > > There is a problem with lifetimes here.  mdsc isn't guaranteed to exist
> > > > > > when ->free_inode() is called.  This can lead to crashes on a NULL mdsc
> > > > > > in ceph_free_inode() in case of e.g. "umount -f".  I know it was Jeff's
> > > > > > suggestion to move the decrement of total_inodes into ceph_free_inode(),
> > > > > > but it doesn't look like it can be easily deferred past ->evict_inode().
> > > > > Okay, I will take a look.
> > > > Given that it's just a counter which we don't care about if the
> > > > mount is going away, some form of "if (mdsc)" check might do, but
> > > > need to make sure that it covers possible races, if any.
> > > > 
> > > Good catch, Ilya.
> > > 
> > > What may be best is to move the increment out of ceph_alloc_inode and
> > > instead put it in ceph_set_ino_cb. Then the decrement can go back into
> > > ceph_evict_inode.
> > 
> > Hi Jeff, Ilya
> > 
> > Checked the code, it seems in the ceph_evict_inode() we will also hit 
> > the same issue .
> > 
> > With the '-f' options when umounting, it will skip the inodes whose 
> > i_count ref > 0. And then free the fsc/mdsc in ceph. And later the 
> > iput_final() will call the ceph_evict_inode() and then ceph_free_inode().
> > 
> > Could we just check if !!(sb->s_flags & SB_ACTIVE) is false will we skip 
> > the counting ?
> > 
> 
> Note that umount -f (MNT_FORCE) just means that ceph_umount_begin is
> called before unmounting.
> 
> If what you're saying it true, then we have bigger problems.
> ceph_evict_inode does this today when ci->i_snap_realm is set:
> 
>     struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc;
> 
> ...and then goes on to use that mdsc pointer.
> 

Now that I look, I don't think that this is a problem. ceph_kill_sb
calls generic_shutdown_super, which calls evict_inodes before the client
is torn down. That should ensure that the mdsc is still good when evict
is called.

We will need to move the increment into the iget5_locked "set" function.
Maybe we can squash the patch below into yours?

----------------------8<---------------------------

ceph: use total_inodes to count hashed inodes instead of allocated ones

We can't guarantee that the mdsc will still be around when free_inode is
called, so move this into evict_inode instead. The increment then will
need to be moved when the thing is hashed, so move that into the set
callback.

Reported-by: Ilya Dryomov <idryomov@xxxxxxxxx>
Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
---
 fs/ceph/inode.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 5b9d2ff8af34..39c13fefba8a 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -42,10 +42,13 @@ static void ceph_inode_work(struct work_struct *work);
 static int ceph_set_ino_cb(struct inode *inode, void *data)
 {
 	struct ceph_inode_info *ci = ceph_inode(inode);
+	struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(inode->i_sb);
 
 	ci->i_vino = *(struct ceph_vino *)data;
 	inode->i_ino = ceph_vino_to_ino_t(ci->i_vino);
 	inode_set_iversion_raw(inode, 0);
+	percpu_counter_inc(&mdsc->metric.total_inodes);
+
 	return 0;
 }
 
@@ -425,7 +428,6 @@ static int ceph_fill_fragtree(struct inode *inode,
  */
 struct inode *ceph_alloc_inode(struct super_block *sb)
 {
-	struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(sb);
 	struct ceph_inode_info *ci;
 	int i;
 
@@ -525,17 +527,12 @@ struct inode *ceph_alloc_inode(struct super_block *sb)
 
 	ci->i_meta_err = 0;
 
-	percpu_counter_inc(&mdsc->metric.total_inodes);
-
 	return &ci->vfs_inode;
 }
 
 void ceph_free_inode(struct inode *inode)
 {
 	struct ceph_inode_info *ci = ceph_inode(inode);
-	struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(inode->i_sb);
-
-	percpu_counter_dec(&mdsc->metric.total_inodes);
 
 	kfree(ci->i_symlink);
 	kmem_cache_free(ceph_inode_cachep, ci);
@@ -544,11 +541,14 @@ void ceph_free_inode(struct inode *inode)
 void ceph_evict_inode(struct inode *inode)
 {
 	struct ceph_inode_info *ci = ceph_inode(inode);
+	struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(inode->i_sb);
 	struct ceph_inode_frag *frag;
 	struct rb_node *n;
 
 	dout("evict_inode %p ino %llx.%llx\n", inode, ceph_vinop(inode));
 
+	percpu_counter_dec(&mdsc->metric.total_inodes);
+
 	truncate_inode_pages_final(&inode->i_data);
 	clear_inode(inode);
 
-- 
2.26.2





[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux