Hi, I deployed rook v0.8.3 with ceph 12.2.7. This is production system deployed for a long time. Because unknown reason, mon couldn't form quorum anymore and I tried to restore mon from osd by following document below, https://github.com/ceph/ceph/blob/v12.2.7/doc/rados/troubleshooting/troubleshooting-mon.rst#recovery-using-osds After collecting cluster map data, replace store.db and restart mon, monitor log indicated that it tried to load "000000.sst" which was not existed. Log also indicated that mon found all .sst files during startup. Detailed log as below. 2020-07-28 09:44:38.100932 I | rook-ceph-mon2: 2020-07-28 09:44:38.100799 7f2e4abd0ec0 4 rocksdb: CURRENT file: CURRENT 2020-07-28 09:44:38.100946 I | rook-ceph-mon2: 2020-07-28 09:44:38.100951 I | rook-ceph-mon2: 2020-07-28 09:44:38.100847 7f2e4abd0ec0 4 rocksdb: IDENTITY file: IDENTITY 2020-07-28 09:44:38.100958 I | rook-ceph-mon2: 2020-07-28 09:44:38.100963 I | rook-ceph-mon2: 2020-07-28 09:44:38.100865 7f2e4abd0ec0 4 rocksdb: MANIFEST file: MANIFEST-000014 size: 284 Bytes 2020-07-28 09:44:38.100967 I | rook-ceph-mon2: 2020-07-28 09:44:38.100972 I | rook-ceph-mon2: 2020-07-28 09:44:38.100869 7f2e4abd0ec0 4 rocksdb: SST files in /var/lib/rook/rook-ceph-mon2/data/store.db dir, Total Num: 3, files: 000004.sst 000007.sst 000010.sst 2020-07-28 09:44:38.100976 I | rook-ceph-mon2: 2020-07-28 09:44:38.100981 I | rook-ceph-mon2: 2020-07-28 09:44:38.100872 7f2e4abd0ec0 4 rocksdb: Write Ahead Log file in /var/lib/rook/rook-ceph-mon2/data/store.db: 000015.log size: 0 ; 2020-07-28 09:44:38.100985 I | rook-ceph-mon2: 2020-07-28 09:44:38.100989 I | rook-ceph-mon2: 2020-07-28 09:44:38.100874 7f2e4abd0ec0 4 rocksdb: Options.error_if_exists: 0 ... 2020-07-28 09:44:38.101528 I | rook-ceph-mon2: 2020-07-28 09:44:38.101467 7f2e4abd0ec0 4 rocksdb: Fast CRC32 supported: 1 2020-07-28 09:44:38.104667 I | rook-ceph-mon2: 2020-07-28 09:44:38.104317 7f2e4abd0ec0 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/version_set.cc:2609] Recovering from manifest file: MANIFEST-000014 2020-07-28 09:44:38.104726 I | rook-ceph-mon2: 2020-07-28 09:44:38.104926 I | rook-ceph-mon2: 2020-07-28 09:44:38.104582 7f2e4abd0ec0 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/column_family.cc:407] --------------- Options for column family [default]: ... 2020-07-28 09:44:38.105633 I | rook-ceph-mon2: 2020-07-28 09:44:38.104857 7f2e4abd0ec0 4 rocksdb: Options.report_bg_io_stats: 0 2020-07-28 09:44:38.111205 I | rook-ceph-mon2: 2020-07-28 09:44:38.110905 7f2e4abd0ec0 2 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/version_set.cc:1062] Unable to load table properties for file 0 --- IO error: /var/lib/rook/rook-ceph-mon2/data/store.db/000000.sst: No such file or directory 2020-07-28 09:44:38.111266 I | rook-ceph-mon2: 2020-07-28 09:44:38.111693 I | rook-ceph-mon2: 2020-07-28 09:44:38.110999 7f2e4abd0ec0 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/version_set.cc:2859] Recovered from manifest file:/var/lib/rook/rook-ceph-mon2/data/store.db/MANIFEST-000014 succeeded,manifest_file_number is 14, next_file_number is 17, last_sequence is 111, log_number is 0,prev_log_number is 0,max_column_family is 0 2020-07-28 09:44:38.111723 I | rook-ceph-mon2: 2020-07-28 09:44:38.111732 I | rook-ceph-mon2: 2020-07-28 09:44:38.111006 7f2e4abd0ec0 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/version_set.cc:2867] Column family [default] (ID 0), log number is 13 2020-07-28 09:44:38.111738 I | rook-ceph-mon2: 2020-07-28 09:44:38.111746 I | rook-ceph-mon2: 2020-07-28 09:44:38.111101 7f2e4abd0ec0 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/db_impl.cc:217] Shutdown: canceling all background work 2020-07-28 09:44:38.111764 I | rook-ceph-mon2: 2020-07-28 09:44:38.111214 7f2e4abd0ec0 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/db_impl.cc:343] Shutdown complete 2020-07-28 09:44:38.112077 I | rook-ceph-mon2: 2020-07-28 09:44:38.111862 7f2e4abd0ec0 -1 rocksdb: Corruption: Can't access /000000.sst: IO error: /var/lib/rook/rook-ceph-mon2/data/store.db/000000.sst: No such file or directory 2020-07-28 09:44:38.112096 I | rook-ceph-mon2: 2020-07-28 09:44:38.112103 I | rook-ceph-mon2: 2020-07-28 09:44:38.111941 7f2e4abd0ec0 -1 error opening mon data directory at '/var/lib/rook/rook-ceph-mon2/data': (22) Invalid argument 2020-07-28 09:44:38.116813 I | rook-ceph-mon2: 2020-07-28 09:44:38.111862 7f2e4abd0ec0 -1 rocksdb: Corruption: Can't access /000000.sst: IO error: /var/lib/rook/rook-ceph-mon2/data/store.db/000000.sst: No such file or directory 2020-07-28 09:44:38.116874 I | rook-ceph-mon2: 2020-07-28 09:44:38.116883 I | rook-ceph-mon2: 2020-07-28 09:44:38.111941 7f2e4abd0ec0 -1 error opening mon data directory at '/var/lib/rook/rook-ceph-mon2/data': (22) Invalid argument failed to run mon. failed to start mon: Failed to complete 'rook-ceph-mon2': exit status 1. It's a little weird that it tried to load "000000.sst" although rook found the correct three .sst files during startup. There is no "000000.sst" related content in store.db. Any advice for this problem? Is it possible that I executed any wrong steps? Or is there any workaround for this? root@bagwig4:/var/lib/rook/rook-ceph-mon2/data# ls -al store.db/ total 92 drwxr-xr-x 2 root root 188 Jul 28 01:35 . drwxr--r-- 3 root root 55 Jul 28 02:37 .. -rw-r--r-- 1 root root 56547 Jul 28 02:38 000004.sst -rw-r--r-- 1 root root 1179 Jul 28 02:38 000007.sst -rw-r--r-- 1 root root 1243 Jul 28 02:38 000010.sst -rw-r--r-- 1 root root 0 Jul 28 02:38 000015.log -rw-r--r-- 1 root root 16 Jul 28 02:38 CURRENT -rw-r--r-- 1 root root 37 Jul 28 02:38 IDENTITY -rw-r--r-- 1 root root 0 Jul 28 02:38 LOCK -rw-r--r-- 1 root root 284 Jul 28 02:38 MANIFEST-000014 -rw-r--r-- 1 root root 4620 Jul 28 02:38 OPTIONS-000014 -rw-r--r-- 1 root root 4620 Jul 28 02:38 OPTIONS-000017 root@bagwig4:/var/lib/rook/rook-ceph-mon2/data# find . -type f | xargs grep "000000.sst" root@bagwig4:/var/lib/rook/rook-ceph-mon2/data# find . -type f | xargs grep "000000" ./store.db/OPTIONS-000017: delete_obsolete_files_period_micros=21600000000 ./store.db/OPTIONS-000017: memtable_prefix_bloom_size_ratio=0.000000 ./store.db/OPTIONS-000017: max_bytes_for_level_multiplier=10.000000 ./store.db/OPTIONS-000014: delete_obsolete_files_period_micros=21600000000 ./store.db/OPTIONS-000014: memtable_prefix_bloom_size_ratio=0.000000 ./store.db/OPTIONS-000014: max_bytes_for_level_multiplier=10.000000 Thanks, Jared, (韦煜) Software developer Interested in open source software, big data, Linux _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx