Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Igor,

thank you for your response!

>> 
>> Do you suggest to disable the HDD write-caching and / or the bluefs_buffered_io for productive clusters?
>> 
> Generally upstream recommendation is to disable disk write caching, there were multiple complains it might negatively impact the performance in some setups.
> 
> As for bluefs_buffered_io - please keep it on, the disablmement is known to cause performance drop.

Thanks for the explanation. For the enabled disk write cache you only mentioned possible performance problem, but can the enabled disk write cache also lead to data corruption? Or make a problem more likely than with a disabled disk cache?

> 
>> 
>>> When rebooting a node  - did you perform it by regular OS command (reboot or poweroff) or by a power switch?
>> I never did a hard reset or used the power switch. I used `init 6` for performing a reboot. Each server has redundant power supplies with one connected to a battery backup and the other to the grid. Therefore, I do think that none of the servers ever faced a non clean shutdown or reboot.
>> 
> So the original reboot which caused the failures was made in the same manner, right?

Yes, Exactly.
And the OSD logs confirms that:

OSD 4:
2021-12-12T21:33:07.780+0100 7f464a944700 -1 received  signal: Terminated from /sbin/init  (PID: 1) UID: 0
2021-12-12T21:33:07.780+0100 7f464a944700 -1 osd.4 2606 *** Got signal Terminated ***
2021-12-12T21:33:07.780+0100 7f464a944700 -1 osd.4 2606 *** Immediate shutdown (osd_fast_shutdown=true) ***
2021-12-12T21:35:29.918+0100 7ffa5ce42f00  0 set uid:gid to 64045:64045 (ceph:ceph)
2021-12-12T21:35:29.918+0100 7ffa5ce42f00  0 ceph version 16.2.6 (1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable), process ceph-osd, pid 1608
:...
2021-12-12T21:35:32.509+0100 7ffa5ce42f00 -1 rocksdb: Corruption: Bad table magic number: expected 9863518390377041911, found 0 in db/002145.sst
2021-12-12T21:35:32.509+0100 7ffa5ce42f00 -1 bluestore(/var/lib/ceph/osd/ceph-4) _open_db erroring opening db: 


OSD 7:
2021-12-12T21:20:11.141+0100 7f9714894700 -1 received  signal: Terminated from /sbin/init  (PID: 1) UID: 0
2021-12-12T21:20:11.141+0100 7f9714894700 -1 osd.7 2591 *** Got signal Terminated ***
2021-12-12T21:20:11.141+0100 7f9714894700 -1 osd.7 2591 *** Immediate shutdown (osd_fast_shutdown=true) ***
2021-12-12T21:21:41.881+0100 7f63c6557f00  0 set uid:gid to 64045:64045 (ceph:ceph)
2021-12-12T21:21:41.881+0100 7f63c6557f00  0 ceph version 16.2.6 (1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable), process ceph-osd, pid 1937
:...
2021-12-12T21:21:44.557+0100 7f63c6557f00 -1 rocksdb: Corruption: Bad table magic number: expected 9863518390377041911, found 0 in db/002182.sst
2021-12-12T21:21:44.557+0100 7f63c6557f00 -1 bluestore(/var/lib/ceph/osd/ceph-7) _open_db erroring opening db: 


OSD 8:
2021-12-12T21:20:11.141+0100 7fd1ccf01700 -1 received  signal: Terminated from /sbin/init  (PID: 1) UID: 0
2021-12-12T21:20:11.141+0100 7fd1ccf01700 -1 osd.8 2591 *** Got signal Terminated ***
2021-12-12T21:20:11.141+0100 7fd1ccf01700 -1 osd.8 2591 *** Immediate shutdown (osd_fast_shutdown=true) ***
2021-12-12T21:21:41.881+0100 7f6d18d2bf00  0 set uid:gid to 64045:64045 (ceph:ceph)
2021-12-12T21:21:41.881+0100 7f6d18d2bf00  0 ceph version 16.2.6 (1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable), process ceph-osd, pid 1938
:...
2021-12-12T21:21:44.577+0100 7f6d18d2bf00 -1 rocksdb: Corruption: Bad table magic number: expected 9863518390377041911, found 0 in db/002182.sst
2021-12-12T21:21:44.577+0100 7f6d18d2bf00 -1 bluestore(/var/lib/ceph/osd/ceph-8) _open_db erroring opening db: 



Best regards,
Sebastian


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux