On 2020/9/12 3:46, Jeff Layton wrote:
On Fri, 2020-09-11 at 07:49 -0400, Jeff Layton wrote:
On Fri, 2020-09-11 at 11:43 +0800, Xiubo Li wrote:
On 2020/9/10 20:13, Jeff Layton wrote:
On Thu, 2020-09-10 at 08:00 +0200, Ilya Dryomov wrote:
On Thu, Sep 10, 2020 at 2:59 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
On 2020/9/10 4:34, Ilya Dryomov wrote:
On Thu, Sep 3, 2020 at 4:22 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
On 2020/9/3 22:18, Jeff Layton wrote:
On Thu, 2020-09-03 at 09:01 -0400, xiubli@xxxxxxxxxx wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>
Changed in V5:
- Remove mdsc parsing helpers except the ceph_sb_to_mdsc()
- Remove the is_opened member.
Changed in V4:
- A small fix about the total_inodes.
Changed in V3:
- Resend for V2 just forgot one patch, which is adding some helpers
support to simplify the code.
Changed in V2:
- Add number of inodes that have opened files.
- Remove the dir metrics and fold into files.
Xiubo Li (2):
ceph: add ceph_sb_to_mdsc helper support to parse the mdsc
ceph: metrics for opened files, pinned caps and opened inodes
fs/ceph/caps.c | 41 +++++++++++++++++++++++++++++++++++++----
fs/ceph/debugfs.c | 11 +++++++++++
fs/ceph/dir.c | 20 +++++++-------------
fs/ceph/file.c | 13 ++++++-------
fs/ceph/inode.c | 11 ++++++++---
fs/ceph/locks.c | 2 +-
fs/ceph/metric.c | 14 ++++++++++++++
fs/ceph/metric.h | 7 +++++++
fs/ceph/quota.c | 10 +++++-----
fs/ceph/snap.c | 2 +-
fs/ceph/super.h | 6 ++++++
11 files changed, 103 insertions(+), 34 deletions(-)
Looks good. I went ahead and merge this into testing.
Small merge conflict in quota.c, which I guess is probably due to not
basing this on testing branch. I also dropped what looks like an
unrelated hunk in the second patch.
In the future, if you can be sure that patches you post apply cleanly to
testing branch then that would make things easier.
Okay, will do it.
Hi Xiubo,
There is a problem with lifetimes here. mdsc isn't guaranteed to exist
when ->free_inode() is called. This can lead to crashes on a NULL mdsc
in ceph_free_inode() in case of e.g. "umount -f". I know it was Jeff's
suggestion to move the decrement of total_inodes into ceph_free_inode(),
but it doesn't look like it can be easily deferred past ->evict_inode().
Okay, I will take a look.
Given that it's just a counter which we don't care about if the
mount is going away, some form of "if (mdsc)" check might do, but
need to make sure that it covers possible races, if any.
Good catch, Ilya.
What may be best is to move the increment out of ceph_alloc_inode and
instead put it in ceph_set_ino_cb. Then the decrement can go back into
ceph_evict_inode.
Hi Jeff, Ilya
Checked the code, it seems in the ceph_evict_inode() we will also hit
the same issue .
With the '-f' options when umounting, it will skip the inodes whose
i_count ref > 0. And then free the fsc/mdsc in ceph. And later the
iput_final() will call the ceph_evict_inode() and then ceph_free_inode().
Could we just check if !!(sb->s_flags & SB_ACTIVE) is false will we skip
the counting ?
Note that umount -f (MNT_FORCE) just means that ceph_umount_begin is
called before unmounting.
If what you're saying it true, then we have bigger problems.
ceph_evict_inode does this today when ci->i_snap_realm is set:
struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc;
...and then goes on to use that mdsc pointer.
Now that I look, I don't think that this is a problem. ceph_kill_sb
calls generic_shutdown_super, which calls evict_inodes before the client
is torn down. That should ensure that the mdsc is still good when evict
is called.
We will need to move the increment into the iget5_locked "set" function.
Maybe we can squash the patch below into yours?
Yeah, the following patch looks good.
Thanks.
----------------------8<---------------------------
ceph: use total_inodes to count hashed inodes instead of allocated ones
We can't guarantee that the mdsc will still be around when free_inode is
called, so move this into evict_inode instead. The increment then will
need to be moved when the thing is hashed, so move that into the set
callback.
Reported-by: Ilya Dryomov <idryomov@xxxxxxxxx>
Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
---
fs/ceph/inode.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 5b9d2ff8af34..39c13fefba8a 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -42,10 +42,13 @@ static void ceph_inode_work(struct work_struct *work);
static int ceph_set_ino_cb(struct inode *inode, void *data)
{
struct ceph_inode_info *ci = ceph_inode(inode);
+ struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(inode->i_sb);
ci->i_vino = *(struct ceph_vino *)data;
inode->i_ino = ceph_vino_to_ino_t(ci->i_vino);
inode_set_iversion_raw(inode, 0);
+ percpu_counter_inc(&mdsc->metric.total_inodes);
+
return 0;
}
@@ -425,7 +428,6 @@ static int ceph_fill_fragtree(struct inode *inode,
*/
struct inode *ceph_alloc_inode(struct super_block *sb)
{
- struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(sb);
struct ceph_inode_info *ci;
int i;
@@ -525,17 +527,12 @@ struct inode *ceph_alloc_inode(struct super_block *sb)
ci->i_meta_err = 0;
- percpu_counter_inc(&mdsc->metric.total_inodes);
-
return &ci->vfs_inode;
}
void ceph_free_inode(struct inode *inode)
{
struct ceph_inode_info *ci = ceph_inode(inode);
- struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(inode->i_sb);
-
- percpu_counter_dec(&mdsc->metric.total_inodes);
kfree(ci->i_symlink);
kmem_cache_free(ceph_inode_cachep, ci);
@@ -544,11 +541,14 @@ void ceph_free_inode(struct inode *inode)
void ceph_evict_inode(struct inode *inode)
{
struct ceph_inode_info *ci = ceph_inode(inode);
+ struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(inode->i_sb);
struct ceph_inode_frag *frag;
struct rb_node *n;
dout("evict_inode %p ino %llx.%llx\n", inode, ceph_vinop(inode));
+ percpu_counter_dec(&mdsc->metric.total_inodes);
+
truncate_inode_pages_final(&inode->i_data);
clear_inode(inode);