Hi all, rocksdb failed to open when the ceph-osd process was restarted after unplugging the OSD data disk with Ceph 14.2.5 on Centos 7.6. 1) After unplugging the OSD data disk, the ceph-osd process exist. -3> 2020-07-13 15:25:35.912 7f1ad7254700 -1 bdev(0x559d1134f880 /var/lib/ceph/osd/ceph-10/block) _sync_write sync_file_range error: (5) Input/output error -2> 2020-07-13 15:25:35.912 7f1ad9c5f700 -1 bdev(0x559d1134f880 /var/lib/ceph/osd/ceph-10/block) _aio_thread got r=-5 ((5) Input/output error) -1> 2020-07-13 15:25:35.917 7f1ad9c5f700 -1 /root/rpmbuild/BUILD/ceph-14.2.5-1.0.9/src/os/bluestore/KernelDevice.cc: In function 'void KernelDevice::_aio_thread()' thread 7f1ad9c5f700 time 2020-07-13 15:25:35.913821 /root/rpmbuild/BUILD/ceph-14.2.5-1.0.9/src/os/bluestore/KernelDevice.cc: 534: ceph_abort_msg("Unexpected IO error. This may suggest a hardware issue. Please check your kernel log!") ceph version 14.2.5-93-g9a4f93e (9a4f93e7143bcdd5fadc88eb58bb730ae97b89c5) nautilus (stable) 1: (ceph::__ceph_abort(char const*, int, char const*, std::string const&)+0xdd) [0x559d05b6069a] 2: (KernelDevice::_aio_thread()+0xebe) [0x559d061a54ee] 3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x559d061a7add] 4: (()+0x7dd5) [0x7f1ae66aedd5] 5: (clone()+0x6d) [0x7f1ae5572ead] 2) Plug the disk back in and restart the ceph-osd process, rocksdb found that incomplete records existed and stop to work. 2020-07-13 15:51:38.305 7f9801ef5a80 4 rocksdb: [db/db_impl_open.cc:583] Recovering log #9 mode 0 2020-07-13 15:51:38.748 7f9801ef5a80 3 rocksdb: [db/db_impl_open.cc:518] db.wal/000009.log: dropping 2922 bytes; Corruption: missing start of fragmented record(2) 2020-07-13 15:51:38.748 7f9801ef5a80 4 rocksdb: [db/db_impl.cc:390] Shutdown: canceling all background work 2020-07-13 15:51:38.748 7f9801ef5a80 4 rocksdb: [db/db_impl.cc:563] Shutdown complete 2020-07-13 15:51:38.748 7f9801ef5a80 -1 rocksdb: Corruption: missing start of fragmented record(2) 2020-07-13 15:51:38.748 7f9801ef5a80 -1 bluestore(/var/lib/ceph/osd/ceph-10) _open_db erroring opening db: 2020-07-13 15:51:38.748 7f9801ef5a80 1 bluefs umount 2020-07-13 15:51:38.776 7f9801ef5a80 1 fbmap_alloc 0x55c897e0a900 shutdown 2020-07-13 15:51:38.776 7f9801ef5a80 1 bdev(0x55c898a6ce00 /var/lib/ceph/osd/ceph-10/block) close Why does rocksdb not automatically delete these incomplete records and continue work? In addition, after the occurrence of this situation, what method should be used to recover. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx