On 07/11/2022 18:39, Luís Henriques wrote:
On Mon, Nov 07, 2022 at 03:17:59PM +0800, xiubli@xxxxxxxxxx wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>
When decoding the snaps fails it maybe leaving the 'first_realm'
and 'realm' pointing to the same snaprealm memory. And then it'll
put it twice and could cause random use-after-free, BUG_ON, etc
issues.
Cc: stable@xxxxxxxxxxxxxxx
URL: https://tracker.ceph.com/issues/57686
Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
---
fs/ceph/snap.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index 9bceed2ebda3..baf17df05107 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -849,10 +849,12 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
if (realm_to_rebuild && p >= e)
rebuild_snap_realms(realm_to_rebuild, &dirty_realms);
- if (!first_realm)
+ if (!first_realm) {
first_realm = realm;
- else
+ realm = NULL;
+ } else {
ceph_put_snap_realm(mdsc, realm);
+ }
if (p < e)
goto more;
--
2.31.1
This patch looks correct to me. But I wonder if there's a deeper problem
there (probably not on the kernel client). Because the other question is:
why are we failing to decode the snaps? But I guess this fix is worth it
anyway.
Yeah, good question.
At the same time the MDS also crashed [1][2] just before the kernel
crash was triggered seconds later. And the metadata in cephfs was
corrupted due to some reasons.
[1] https://tracker.ceph.com/issues/56140
[2] https://tracker.ceph.com/issues/54546
Thanks!
- Xiubo
Reviewed-by: Luís Henriques <lhenriques@xxxxxxx>
Cheers,
--
Luís