Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Sebastian,

On 12/22/2021 1:53 AM, Sebastian Mazza wrote:

9) Would you be able to run some long lasting (and potentially data corrupting) experiments at this cluster in an attempt to pin point the issue. I'm thinking about periodic OSD shutdown under the load to catch the corrupting event. With a raised debug level for that specific OSD. The major problem with this bug debugging is that we can see its consequences - but we have no clue about what was happening when actual corruption happened. Hence we need to reproduce that somehow. So please let me know if we can use your cluster/help for that...
I want to help. Destroying the data on the cluster is not a problem. The question is, if I can find enough time, but I will do what I can. So, you are welcome to give me detailed instructions what I should test.
One thing that could be important: I don’t think there was a significant load on the OSDs when this problem happened.

So we want to repoduce the same issue with a more verbose logging. Hence my suggestion for the first attempt would be to try to restart the cluster in the same manner you did it before. With some preceeding steps:

1) Bring cluster back to the healthy state be redeplying broken OSDs.

2) inject verbose bluefs/bdev logging shortly before the restart (do not leave the cluster with these debug levels for a long time as it might consume tons of disk space for logging)

ceph tell osd.* injectargs "--debug-bluefs 20 --debug-bdev 20"

3) bring some load to the cluster to force disk writing.

4) restart the cluster and check OSD status on completion. If someone is/are broken - save the relevant logs


May be try the above multiple times if every OSD is fine on reboot.


Thanks in advance,

--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux