On 8/17/21 6:46 PM, Ilya Dryomov wrote:
On Tue, Aug 17, 2021 at 9:58 AM <xiubli@xxxxxxxxxx> wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>
When force umounting, it will try to remove all the session caps.
If there has any capsnap is in the flushing list, the remove session
caps callback will try to release the capsnap->flush_cap memory to
"ceph_cap_flush_cachep" slab cache, while which is allocated from
kmalloc-256 slab cache.
URL: https://tracker.ceph.com/issues/52283
Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
---
fs/ceph/mds_client.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 00b3b0a0beb8..cb517753bb17 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1264,10 +1264,18 @@ static int remove_session_caps_cb(struct inode *inode, struct ceph_cap *cap,
spin_unlock(&ci->i_ceph_lock);
while (!list_empty(&to_remove)) {
struct ceph_cap_flush *cf;
+ struct ceph_cap_snap *capsnap;
Hi Xiubo,
Add an empty line here.
Sure.
cf = list_first_entry(&to_remove,
struct ceph_cap_flush, i_list);
list_del(&cf->i_list);
- ceph_free_cap_flush(cf);
+ if (cf->caps) {
+ ceph_free_cap_flush(cf);
+ } else if (READ_ONCE(fsc->mount_state) == CEPH_MOUNT_SHUTDOWN) {
What does this condition guard against? Are there other cases of
ceph_cap_flush being embedded that need to be handled differently
on !SHUTDOWN?
From my test this issue could only be reproduced when doing force
umount. When doing the session close it will also do this.
Checked it again, without this it can works well too. I am not very sure
whether will this code is needed.
Since removing this won't block my tests, I will remove this logic from
this patch temporarily and will keep this patch simple to resolve the
crash issue only.
For the force unmount, there have some other issues, like:
<3>[ 324.020531]
=============================================================================
<3>[ 324.020541] BUG ceph_inode_info (Tainted: G E --------- - -):
Objects remaining in ceph_inode_info on __kmem_cache_shutdown()
<3>[ 324.020544]
-----------------------------------------------------------------------------
<3>[ 324.020544]
<4>[ 324.020549] Disabling lock debugging due to kernel taint
<3>[ 324.020553] INFO: Slab 0x000000007ac655b7 objects=20 used=1
fp=0x00000000ab658885 flags=0x17ffffc0008100
<4>[ 324.020559] CPU: 30 PID: 5124 Comm: rmmod Kdump: loaded Tainted:
G B E --------- - - 4.18.0+ #10
<4>[ 324.020561] Hardware name: Red Hat RHEV Hypervisor, BIOS
1.11.0-2.el7 04/01/2014
<4>[ 324.020563] Call Trace:
<4>[ 324.020745] dump_stack+0x5c/0x80
<4>[ 324.020829] slab_err+0xb0/0xd4
<4>[ 324.020872] ? ksm_migrate_page+0x60/0x60
<4>[ 324.020876] ? __kmalloc+0x16f/0x210
<4>[ 324.020879] ? __kmem_cache_shutdown+0x238/0x290
<4>[ 324.020883] __kmem_cache_shutdown.cold.102+0x1c/0x10d
<4>[ 324.020897] shutdown_cache+0x15/0x200
<4>[ 324.020928] kmem_cache_destroy+0x21f/0x250
<4>[ 324.020957] destroy_caches+0x16/0x52 [ceph]
<4>[ 324.021008] __x64_sys_delete_module+0x139/0x270
<4>[ 324.021075] do_syscall_64+0x5b/0x1b0
<4>[ 324.021143] entry_SYSCALL_64_after_hwframe+0x65/0xca
<4>[ 324.021174] RIP: 0033:0x148c0b2c2a8b
I will fix all the memleak issues in a separate patch later.
Should capsnaps be on to_remove list in the first place?
This sounds like stable material to me.
Thanks,
Ilya