MDS stuck in rejoin

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Everyone,

I have a down system that has the MDS stuck in the rejoin state. When I run ceph-mds with -d and --debug_mds 10 I get this repeating: 2022-05-31 00:33:03.554 7fac80ee3700 10 mds.trex-ceph4      my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dir frag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2} 2022-05-31 00:33:03.554 7fac80ee3700 10 mds.trex-ceph4  mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dir
frag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
2022-05-31 00:33:03.554 7fac80ee3700 10 mds.trex-ceph4 my gid is 161986332
2022-05-31 00:33:03.554 7fac80ee3700 10 mds.trex-ceph4 map says I am mds.0.2365745 state up:rejoin 2022-05-31 00:33:03.554 7fac80ee3700 10 mds.trex-ceph4 msgr says i am [v2:172.23.0.44:6800/4094836140,v1:172.23.0.44:6801/4094836140] 2022-05-31 00:33:03.554 7fac80ee3700 10 mds.trex-ceph4 handle_mds_map: handling map as rank 0 2022-05-31 00:33:03.557 7fac83972700  5 mds.beacon.trex-ceph4 received beacon reply up:rejoin seq 31 rtt 0.21701 2022-05-31 00:33:04.185 7fac7c6da700 10 mds.0.cache cache not ready for trimming 2022-05-31 00:33:05.182 7fac7c6da700 10 mds.0.cache cache not ready for trimming
2022-05-31 00:33:05.182 7fac7c6da700 10 mds.0.cache releasing free memory
2022-05-31 00:33:06.182 7fac7c6da700 10 mds.0.cache cache not ready for trimming 2022-05-31 00:33:07.183 7fac7c6da700 10 mds.0.cache cache not ready for trimming 2022-05-31 00:33:07.341 7fac7dedd700  5 mds.beacon.trex-ceph4 Sending beacon up:rejoin seq 32 2022-05-31 00:33:07.341 7fac83972700  5 mds.beacon.trex-ceph4 received beacon reply up:rejoin seq 32 rtt 0 2022-05-31 00:33:08.183 7fac7c6da700 10 mds.0.cache cache not ready for trimming 2022-05-31 00:33:09.184 7fac7c6da700 10 mds.0.cache cache not ready for trimming 2022-05-31 00:33:10.184 7fac7c6da700 10 mds.0.cache cache not ready for trimming 2022-05-31 00:33:11.185 7fac7c6da700 10 mds.0.cache cache not ready for trimming 2022-05-31 00:33:11.341 7fac7dedd700  5 mds.beacon.trex-ceph4 Sending beacon up:rejoin seq 33 2022-05-31 00:33:11.397 7fac80ee3700  1 mds.trex-ceph4 Updating MDS map to version 2365758 from mon.0

and it just stays in that state seemingly forever.  Also it seems to be doing nothing cpu wise.  I don't even know where to look at this point.

I see this in the mon log:

2022-05-31 00:36:27.359 7f39d0c6c700  1 mon.trex-ceph1@0(leader).osd e51026 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 301989888 full_alloc: 322961408 kv_alloc: 390070272

I'm falling asleep at the keyboard trying to get this to work. Any thoughts?

Thanks

-Dave

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux