mon tried to load "000000.sst" which doesn't exist when recovering from osds

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
I deployed rook v0.8.3 with ceph 12.2.7. This is production system deployed for a long time.
Because unknown reason, mon couldn't form quorum anymore and I tried to restore mon from osd by following document below,
https://github.com/ceph/ceph/blob/v12.2.7/doc/rados/troubleshooting/troubleshooting-mon.rst#recovery-using-osds

After collecting cluster map data, replace store.db and restart mon, monitor log indicated that it tried to load "000000.sst" which was not existed. Log also indicated that mon found all .sst files during startup.
Detailed log as below.
2020-07-28 09:44:38.100932 I | rook-ceph-mon2: 2020-07-28 09:44:38.100799 7f2e4abd0ec0  4 rocksdb: CURRENT file:  CURRENT
2020-07-28 09:44:38.100946 I | rook-ceph-mon2:
2020-07-28 09:44:38.100951 I | rook-ceph-mon2: 2020-07-28 09:44:38.100847 7f2e4abd0ec0  4 rocksdb: IDENTITY file:  IDENTITY
2020-07-28 09:44:38.100958 I | rook-ceph-mon2:
2020-07-28 09:44:38.100963 I | rook-ceph-mon2: 2020-07-28 09:44:38.100865 7f2e4abd0ec0  4 rocksdb: MANIFEST file:  MANIFEST-000014 size: 284 Bytes
2020-07-28 09:44:38.100967 I | rook-ceph-mon2:
2020-07-28 09:44:38.100972 I | rook-ceph-mon2: 2020-07-28 09:44:38.100869 7f2e4abd0ec0  4 rocksdb: SST files in /var/lib/rook/rook-ceph-mon2/data/store.db dir, Total Num: 3, files: 000004.sst 000007.sst 000010.sst
2020-07-28 09:44:38.100976 I | rook-ceph-mon2:
2020-07-28 09:44:38.100981 I | rook-ceph-mon2: 2020-07-28 09:44:38.100872 7f2e4abd0ec0  4 rocksdb: Write Ahead Log file in /var/lib/rook/rook-ceph-mon2/data/store.db: 000015.log size: 0 ;
2020-07-28 09:44:38.100985 I | rook-ceph-mon2:
2020-07-28 09:44:38.100989 I | rook-ceph-mon2: 2020-07-28 09:44:38.100874 7f2e4abd0ec0  4 rocksdb:                         Options.error_if_exists: 0
...
2020-07-28 09:44:38.101528 I | rook-ceph-mon2: 2020-07-28 09:44:38.101467 7f2e4abd0ec0  4 rocksdb: Fast CRC32 supported: 1
2020-07-28 09:44:38.104667 I | rook-ceph-mon2: 2020-07-28 09:44:38.104317 7f2e4abd0ec0  4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/version_set.cc:2609] Recovering from manifest file: MANIFEST-000014
2020-07-28 09:44:38.104726 I | rook-ceph-mon2:
2020-07-28 09:44:38.104926 I | rook-ceph-mon2: 2020-07-28 09:44:38.104582 7f2e4abd0ec0  4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/column_family.cc:407] --------------- Options for column family [default]:
...
2020-07-28 09:44:38.105633 I | rook-ceph-mon2: 2020-07-28 09:44:38.104857 7f2e4abd0ec0  4 rocksdb:                Options.report_bg_io_stats: 0
2020-07-28 09:44:38.111205 I | rook-ceph-mon2: 2020-07-28 09:44:38.110905 7f2e4abd0ec0  2 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/version_set.cc:1062] Unable to load table properties for file 0 --- IO error: /var/lib/rook/rook-ceph-mon2/data/store.db/000000.sst: No such file or directory
2020-07-28 09:44:38.111266 I | rook-ceph-mon2:
2020-07-28 09:44:38.111693 I | rook-ceph-mon2: 2020-07-28 09:44:38.110999 7f2e4abd0ec0  4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/version_set.cc:2859] Recovered from manifest file:/var/lib/rook/rook-ceph-mon2/data/store.db/MANIFEST-000014 succeeded,manifest_file_number is 14, next_file_number is 17, last_sequence is 111, log_number is 0,prev_log_number is 0,max_column_family is 0
2020-07-28 09:44:38.111723 I | rook-ceph-mon2:
2020-07-28 09:44:38.111732 I | rook-ceph-mon2: 2020-07-28 09:44:38.111006 7f2e4abd0ec0  4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/version_set.cc:2867] Column family [default] (ID 0), log number is 13
2020-07-28 09:44:38.111738 I | rook-ceph-mon2:
2020-07-28 09:44:38.111746 I | rook-ceph-mon2: 2020-07-28 09:44:38.111101 7f2e4abd0ec0  4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/db_impl.cc:217] Shutdown: canceling all background work
2020-07-28 09:44:38.111764 I | rook-ceph-mon2: 2020-07-28 09:44:38.111214 7f2e4abd0ec0  4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/db_impl.cc:343] Shutdown complete
2020-07-28 09:44:38.112077 I | rook-ceph-mon2: 2020-07-28 09:44:38.111862 7f2e4abd0ec0 -1 rocksdb: Corruption: Can't access /000000.sst: IO error: /var/lib/rook/rook-ceph-mon2/data/store.db/000000.sst: No such file or directory
2020-07-28 09:44:38.112096 I | rook-ceph-mon2:
2020-07-28 09:44:38.112103 I | rook-ceph-mon2: 2020-07-28 09:44:38.111941 7f2e4abd0ec0 -1 error opening mon data directory at '/var/lib/rook/rook-ceph-mon2/data': (22) Invalid argument
2020-07-28 09:44:38.116813 I | rook-ceph-mon2: 2020-07-28 09:44:38.111862 7f2e4abd0ec0 -1 rocksdb: Corruption: Can't access /000000.sst: IO error: /var/lib/rook/rook-ceph-mon2/data/store.db/000000.sst: No such file or directory
2020-07-28 09:44:38.116874 I | rook-ceph-mon2:
2020-07-28 09:44:38.116883 I | rook-ceph-mon2: 2020-07-28 09:44:38.111941 7f2e4abd0ec0 -1 error opening mon data directory at '/var/lib/rook/rook-ceph-mon2/data': (22) Invalid argument
failed to run mon. failed to start mon: Failed to complete 'rook-ceph-mon2': exit status 1.

   It's a little weird that it tried to load "000000.sst" although rook found the correct three .sst files during startup. There is no "000000.sst" related content in store.db.

    Any advice for this problem? Is it possible that I executed any wrong steps? Or is there any workaround for this?

root@bagwig4:/var/lib/rook/rook-ceph-mon2/data# ls -al store.db/
total 92
drwxr-xr-x 2 root root   188 Jul 28 01:35 .
drwxr--r-- 3 root root    55 Jul 28 02:37 ..
-rw-r--r-- 1 root root 56547 Jul 28 02:38 000004.sst
-rw-r--r-- 1 root root  1179 Jul 28 02:38 000007.sst
-rw-r--r-- 1 root root  1243 Jul 28 02:38 000010.sst
-rw-r--r-- 1 root root     0 Jul 28 02:38 000015.log
-rw-r--r-- 1 root root    16 Jul 28 02:38 CURRENT
-rw-r--r-- 1 root root    37 Jul 28 02:38 IDENTITY
-rw-r--r-- 1 root root     0 Jul 28 02:38 LOCK
-rw-r--r-- 1 root root   284 Jul 28 02:38 MANIFEST-000014
-rw-r--r-- 1 root root  4620 Jul 28 02:38 OPTIONS-000014
-rw-r--r-- 1 root root  4620 Jul 28 02:38 OPTIONS-000017
root@bagwig4:/var/lib/rook/rook-ceph-mon2/data# find . -type f | xargs grep "000000.sst"
root@bagwig4:/var/lib/rook/rook-ceph-mon2/data# find . -type f | xargs grep "000000"
./store.db/OPTIONS-000017:  delete_obsolete_files_period_micros=21600000000
./store.db/OPTIONS-000017:  memtable_prefix_bloom_size_ratio=0.000000
./store.db/OPTIONS-000017:  max_bytes_for_level_multiplier=10.000000
./store.db/OPTIONS-000014:  delete_obsolete_files_period_micros=21600000000
./store.db/OPTIONS-000014:  memtable_prefix_bloom_size_ratio=0.000000
./store.db/OPTIONS-000014:  max_bytes_for_level_multiplier=10.000000




Thanks,

Jared, (韦煜)
Software developer
Interested in open source software, big data, Linux
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux