How to deal with the incomplete records in rocksdb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,
	rocksdb failed to open when the ceph-osd process was restarted after unplugging the OSD data disk with Ceph 14.2.5 on Centos 7.6.
	
	1) After unplugging the OSD data disk, the ceph-osd process exist.
	-3> 2020-07-13 15:25:35.912 7f1ad7254700 -1 bdev(0x559d1134f880 /var/lib/ceph/osd/ceph-10/block) _sync_write sync_file_range error: (5) Input/output error
    -2> 2020-07-13 15:25:35.912 7f1ad9c5f700 -1 bdev(0x559d1134f880 /var/lib/ceph/osd/ceph-10/block) _aio_thread got r=-5 ((5) Input/output error)
    -1> 2020-07-13 15:25:35.917 7f1ad9c5f700 -1 /root/rpmbuild/BUILD/ceph-14.2.5-1.0.9/src/os/bluestore/KernelDevice.cc: In function 'void KernelDevice::_aio_thread()' thread 7f1ad9c5f700 time 2020-07-13 15:25:35.913821
	/root/rpmbuild/BUILD/ceph-14.2.5-1.0.9/src/os/bluestore/KernelDevice.cc: 534: ceph_abort_msg("Unexpected IO error. This may suggest a hardware issue. Please check your kernel log!")

	ceph version 14.2.5-93-g9a4f93e (9a4f93e7143bcdd5fadc88eb58bb730ae97b89c5) nautilus (stable)
	1: (ceph::__ceph_abort(char const*, int, char const*, std::string const&)+0xdd) [0x559d05b6069a]
	2: (KernelDevice::_aio_thread()+0xebe) [0x559d061a54ee]
	3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x559d061a7add]
	4: (()+0x7dd5) [0x7f1ae66aedd5]
	5: (clone()+0x6d) [0x7f1ae5572ead]
	
	2) Plug the disk back in and restart the ceph-osd process, rocksdb found that incomplete records existed and stop to work.
	2020-07-13 15:51:38.305 7f9801ef5a80  4 rocksdb: [db/db_impl_open.cc:583] Recovering log #9 mode 0
	2020-07-13 15:51:38.748 7f9801ef5a80  3 rocksdb: [db/db_impl_open.cc:518] db.wal/000009.log: dropping 2922 bytes; Corruption: missing start of fragmented record(2)
	2020-07-13 15:51:38.748 7f9801ef5a80  4 rocksdb: [db/db_impl.cc:390] Shutdown: canceling all background work
	2020-07-13 15:51:38.748 7f9801ef5a80  4 rocksdb: [db/db_impl.cc:563] Shutdown complete
	2020-07-13 15:51:38.748 7f9801ef5a80 -1 rocksdb: Corruption: missing start of fragmented record(2)
	2020-07-13 15:51:38.748 7f9801ef5a80 -1 bluestore(/var/lib/ceph/osd/ceph-10) _open_db erroring opening db:
	2020-07-13 15:51:38.748 7f9801ef5a80  1 bluefs umount
	2020-07-13 15:51:38.776 7f9801ef5a80  1 fbmap_alloc 0x55c897e0a900 shutdown
	2020-07-13 15:51:38.776 7f9801ef5a80  1 bdev(0x55c898a6ce00 /var/lib/ceph/osd/ceph-10/block) close
	
	Why does rocksdb not automatically delete these incomplete records and continue work? 
	In addition, after the occurrence of this situation, what method should be used to recover.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux