We have been using a cephfs pool to store machine data to, the data is not overly critical at this time but. Its got to around 8TB and we started to see kernel panics with the hosts that had the mounts in place. Now when try to start the MDS's they cycle through, Active, Replay, ClientReplay about 10 times and then just fail in a active(laggy)state. So I delete the MDS's (docker-croit)@us-croit-enc-deploy01 ~ $ ceph fs dump dumped fsmap epoch 5307 e5307 enable_multiple, ever_enabled_multiple: 0,0 compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2} legacy client fscid: 1 Filesystem 'cephfs' (1) fs_name cephfs epoch 5307 flags 12 created 2019-10-26 20:43:02.087584 modified 2019-10-26 21:35:17.285598 tableserver 0 root 0 session_timeout 60 session_autoclose 300 max_file_size 1099511627776 min_compat_client -1 (unspecified) last_failure 0 last_failure_osd_epoch 2122066 compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2} max_mds 1 in 0 up {0=267576193} failed damaged stopped data_pools [5,14] metadata_pool 3 inline_data disabled balancer standby_count_wanted 1 267576193: v1:100.129.255.186:6800/1355970155 'us-ceph-enc-svc02' mds.0.5301 up:active seq 16 laggy since 2019-10-26 21:12:08.027863 Looks ok. Then run (docker-croit)@us-croit-enc-deploy01 ~ $ ceph fs ls name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data us_enc_datarefuge_001a ] (docker-croit)@us-croit-enc-deploy01 ~ $ cephfs-journal-tool --rank cephfs:all journal export lee.bak journal is 523855986688~99768 wrote 99768 bytes at offset 523855986688 to lee.bak NOTE: this is a _sparse_ file; you can $ tar cSzf lee.bak.tgz lee.bak to efficiently compress it while preserving sparseness. (docker-croit)@us-croit-enc-deploy01 ~ $ cephfs-journal-tool --rank cephfs:all event recover_dentries summary Events by type: RESETJOURNAL: 1 SESSION: 363 SESSIONS: 17 UPDATE: 14 Errors: 0 (docker-croit)@us-croit-enc-deploy01 ~ $ cephfs-journal-tool --rank cephfs:all journal reset old journal was 523855986688~99768 new journal start will be 523860180992 (4094536 bytes past old end) writing journal head writing EResetJournal entry done (docker-croit)@us-croit-enc-deploy01 ~ $ cephfs-table-tool all reset session { "0": { "data": {}, "result": 0 } } (docker-croit)@us-croit-enc-deploy01 ~ $ cephfs-table-tool all reset snap { "result": 0 } (docker-croit)@us-croit-enc-deploy01 ~ $ cephfs-table-tool all reset inode { "0": { "data": {}, "result": 0 } } Re add the MDS's and we go back round in a circle. Am I missing something? do I need to drop the metadata and recreate it maybe? If it comes to it I can drop all the data and start over, but don't really want to. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx