On Fri, Sep 11, 2020 at 9:46 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > On Fri, 2020-09-11 at 07:49 -0400, Jeff Layton wrote: > > On Fri, 2020-09-11 at 11:43 +0800, Xiubo Li wrote: > > > On 2020/9/10 20:13, Jeff Layton wrote: > > > > On Thu, 2020-09-10 at 08:00 +0200, Ilya Dryomov wrote: > > > > > On Thu, Sep 10, 2020 at 2:59 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > > > > > > On 2020/9/10 4:34, Ilya Dryomov wrote: > > > > > > > On Thu, Sep 3, 2020 at 4:22 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > > > > > > > > On 2020/9/3 22:18, Jeff Layton wrote: > > > > > > > > > On Thu, 2020-09-03 at 09:01 -0400, xiubli@xxxxxxxxxx wrote: > > > > > > > > > > From: Xiubo Li <xiubli@xxxxxxxxxx> > > > > > > > > > > > > > > > > > > > > Changed in V5: > > > > > > > > > > - Remove mdsc parsing helpers except the ceph_sb_to_mdsc() > > > > > > > > > > - Remove the is_opened member. > > > > > > > > > > > > > > > > > > > > Changed in V4: > > > > > > > > > > - A small fix about the total_inodes. > > > > > > > > > > > > > > > > > > > > Changed in V3: > > > > > > > > > > - Resend for V2 just forgot one patch, which is adding some helpers > > > > > > > > > > support to simplify the code. > > > > > > > > > > > > > > > > > > > > Changed in V2: > > > > > > > > > > - Add number of inodes that have opened files. > > > > > > > > > > - Remove the dir metrics and fold into files. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Xiubo Li (2): > > > > > > > > > > ceph: add ceph_sb_to_mdsc helper support to parse the mdsc > > > > > > > > > > ceph: metrics for opened files, pinned caps and opened inodes > > > > > > > > > > > > > > > > > > > > fs/ceph/caps.c | 41 +++++++++++++++++++++++++++++++++++++---- > > > > > > > > > > fs/ceph/debugfs.c | 11 +++++++++++ > > > > > > > > > > fs/ceph/dir.c | 20 +++++++------------- > > > > > > > > > > fs/ceph/file.c | 13 ++++++------- > > > > > > > > > > fs/ceph/inode.c | 11 ++++++++--- > > > > > > > > > > fs/ceph/locks.c | 2 +- > > > > > > > > > > fs/ceph/metric.c | 14 ++++++++++++++ > > > > > > > > > > fs/ceph/metric.h | 7 +++++++ > > > > > > > > > > fs/ceph/quota.c | 10 +++++----- > > > > > > > > > > fs/ceph/snap.c | 2 +- > > > > > > > > > > fs/ceph/super.h | 6 ++++++ > > > > > > > > > > 11 files changed, 103 insertions(+), 34 deletions(-) > > > > > > > > > > > > > > > > > > > Looks good. I went ahead and merge this into testing. > > > > > > > > > > > > > > > > > > Small merge conflict in quota.c, which I guess is probably due to not > > > > > > > > > basing this on testing branch. I also dropped what looks like an > > > > > > > > > unrelated hunk in the second patch. > > > > > > > > > > > > > > > > > > In the future, if you can be sure that patches you post apply cleanly to > > > > > > > > > testing branch then that would make things easier. > > > > > > > > Okay, will do it. > > > > > > > Hi Xiubo, > > > > > > > > > > > > > > There is a problem with lifetimes here. mdsc isn't guaranteed to exist > > > > > > > when ->free_inode() is called. This can lead to crashes on a NULL mdsc > > > > > > > in ceph_free_inode() in case of e.g. "umount -f". I know it was Jeff's > > > > > > > suggestion to move the decrement of total_inodes into ceph_free_inode(), > > > > > > > but it doesn't look like it can be easily deferred past ->evict_inode(). > > > > > > Okay, I will take a look. > > > > > Given that it's just a counter which we don't care about if the > > > > > mount is going away, some form of "if (mdsc)" check might do, but > > > > > need to make sure that it covers possible races, if any. > > > > > > > > > Good catch, Ilya. > > > > > > > > What may be best is to move the increment out of ceph_alloc_inode and > > > > instead put it in ceph_set_ino_cb. Then the decrement can go back into > > > > ceph_evict_inode. > > > > > > Hi Jeff, Ilya > > > > > > Checked the code, it seems in the ceph_evict_inode() we will also hit > > > the same issue . > > > > > > With the '-f' options when umounting, it will skip the inodes whose > > > i_count ref > 0. And then free the fsc/mdsc in ceph. And later the > > > iput_final() will call the ceph_evict_inode() and then ceph_free_inode(). > > > > > > Could we just check if !!(sb->s_flags & SB_ACTIVE) is false will we skip > > > the counting ? > > > > > > > Note that umount -f (MNT_FORCE) just means that ceph_umount_begin is > > called before unmounting. > > > > If what you're saying it true, then we have bigger problems. > > ceph_evict_inode does this today when ci->i_snap_realm is set: > > > > struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc; > > > > ...and then goes on to use that mdsc pointer. > > > > Now that I look, I don't think that this is a problem. ceph_kill_sb > calls generic_shutdown_super, which calls evict_inodes before the client > is torn down. That should ensure that the mdsc is still good when evict > is called. > > We will need to move the increment into the iget5_locked "set" function. > Maybe we can squash the patch below into yours? > > ----------------------8<--------------------------- > > ceph: use total_inodes to count hashed inodes instead of allocated ones > > We can't guarantee that the mdsc will still be around when free_inode is > called, so move this into evict_inode instead. The increment then will > need to be moved when the thing is hashed, so move that into the set > callback. > > Reported-by: Ilya Dryomov <idryomov@xxxxxxxxx> > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > --- > fs/ceph/inode.c | 12 ++++++------ > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c > index 5b9d2ff8af34..39c13fefba8a 100644 > --- a/fs/ceph/inode.c > +++ b/fs/ceph/inode.c > @@ -42,10 +42,13 @@ static void ceph_inode_work(struct work_struct *work); > static int ceph_set_ino_cb(struct inode *inode, void *data) > { > struct ceph_inode_info *ci = ceph_inode(inode); > + struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(inode->i_sb); > > ci->i_vino = *(struct ceph_vino *)data; > inode->i_ino = ceph_vino_to_ino_t(ci->i_vino); > inode_set_iversion_raw(inode, 0); > + percpu_counter_inc(&mdsc->metric.total_inodes); > + > return 0; > } > > @@ -425,7 +428,6 @@ static int ceph_fill_fragtree(struct inode *inode, > */ > struct inode *ceph_alloc_inode(struct super_block *sb) > { > - struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(sb); > struct ceph_inode_info *ci; > int i; > > @@ -525,17 +527,12 @@ struct inode *ceph_alloc_inode(struct super_block *sb) > > ci->i_meta_err = 0; > > - percpu_counter_inc(&mdsc->metric.total_inodes); > - > return &ci->vfs_inode; > } > > void ceph_free_inode(struct inode *inode) > { > struct ceph_inode_info *ci = ceph_inode(inode); > - struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(inode->i_sb); > - > - percpu_counter_dec(&mdsc->metric.total_inodes); > > kfree(ci->i_symlink); > kmem_cache_free(ceph_inode_cachep, ci); > @@ -544,11 +541,14 @@ void ceph_free_inode(struct inode *inode) > void ceph_evict_inode(struct inode *inode) > { > struct ceph_inode_info *ci = ceph_inode(inode); > + struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(inode->i_sb); I'd also remove a duplicate mdsc variable declared in ci->i_snap_realm branch. Thanks, Ilya