Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sebastian,

could you please share failing OSD startup log?


Thanks,

Igor

On 2/20/2022 5:10 PM, Sebastian Mazza wrote:
Hi Igor,

it happened again. One of the OSDs that crashed last time, has a corrupted RocksDB again. Unfortunately I do not have debug logs from the OSDs again. I was collecting hundreds of Gigabytes of OSD debug logs in the last two month. But this week, I disabled the debug logging, because I did some tests with rsync to cephFS and RBD Images on EC pools and the logs did fill up my boot drives multiple times.
The corruption happened after I did shut down all 3 nodes and booted it some minutes later.

If you are interested, I could share the normal log of the OSD. A log of a failed OSD start with debug logging enabled and als the corrupted RocksDB export.

It is may be worth taking a note that no crash did happen after hundreds of reboots but now it happens after I gracefully shut down all nodes for around 10 minutes.
Best to my knowledge there was no IO on the crashed OSD for several hours. The crashed OSD was used by only two pools. Both are EC pools. One is used as data part for  RBD image and on as data storage for a subdirectory of a cephFS. All metadata for the cephFS and the RBD pool are stored on replicated NVMEs.
On RBD image on the HDD EC pool was mounted by a VM, but not as boot drive. The cephFS was mounted also by this VM and the 3 cluster nodes itself. Apart from mounting/unmounting, neither the cephFS nor the BTRFS on the RBD image was asked to process any IOs. So nobody was reading or writing to the failed OSD for many hours before the shutdown of the cluster and OSD failing happened.


I’m now thinking of how I could add more storage space for the log files to each node, so that I can leave on the debug logging all the time.


Best regards,
Sebastian

--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux