On 1/26/2022 1:18 AM, Sebastian Mazza wrote:
Hey Igor,
thank you for your response!
Do you suggest to disable the HDD write-caching and / or the bluefs_buffered_io for productive clusters?
Generally upstream recommendation is to disable disk write caching, there were multiple complains it might negatively impact the performance in some setups.
As for bluefs_buffered_io - please keep it on, the disablmement is known to cause performance drop.
Thanks for the explanation. For the enabled disk write cache you only mentioned possible performance problem, but can the enabled disk write cache also lead to data corruption? Or make a problem more likely than with a disabled disk cache?
Definitely it can, particularly if cache isn't protected from power
loss or the implementation isn't so good ;)
When rebooting a node - did you perform it by regular OS command (reboot or poweroff) or by a power switch?
I never did a hard reset or used the power switch. I used `init 6` for performing a reboot. Each server has redundant power supplies with one connected to a battery backup and the other to the grid. Therefore, I do think that none of the servers ever faced a non clean shutdown or reboot.
So the original reboot which caused the failures was made in the same manner, right?
Yes, Exactly.
And the OSD logs confirms that:
OSD 4:
2021-12-12T21:33:07.780+0100 7f464a944700 -1 received signal: Terminated from /sbin/init (PID: 1) UID: 0
2021-12-12T21:33:07.780+0100 7f464a944700 -1 osd.4 2606 *** Got signal Terminated ***
2021-12-12T21:33:07.780+0100 7f464a944700 -1 osd.4 2606 *** Immediate shutdown (osd_fast_shutdown=true) ***
2021-12-12T21:35:29.918+0100 7ffa5ce42f00 0 set uid:gid to 64045:64045 (ceph:ceph)
2021-12-12T21:35:29.918+0100 7ffa5ce42f00 0 ceph version 16.2.6 (1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable), process ceph-osd, pid 1608
:...
2021-12-12T21:35:32.509+0100 7ffa5ce42f00 -1 rocksdb: Corruption: Bad table magic number: expected 9863518390377041911, found 0 in db/002145.sst
2021-12-12T21:35:32.509+0100 7ffa5ce42f00 -1 bluestore(/var/lib/ceph/osd/ceph-4) _open_db erroring opening db:
OSD 7:
2021-12-12T21:20:11.141+0100 7f9714894700 -1 received signal: Terminated from /sbin/init (PID: 1) UID: 0
2021-12-12T21:20:11.141+0100 7f9714894700 -1 osd.7 2591 *** Got signal Terminated ***
2021-12-12T21:20:11.141+0100 7f9714894700 -1 osd.7 2591 *** Immediate shutdown (osd_fast_shutdown=true) ***
2021-12-12T21:21:41.881+0100 7f63c6557f00 0 set uid:gid to 64045:64045 (ceph:ceph)
2021-12-12T21:21:41.881+0100 7f63c6557f00 0 ceph version 16.2.6 (1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable), process ceph-osd, pid 1937
:...
2021-12-12T21:21:44.557+0100 7f63c6557f00 -1 rocksdb: Corruption: Bad table magic number: expected 9863518390377041911, found 0 in db/002182.sst
2021-12-12T21:21:44.557+0100 7f63c6557f00 -1 bluestore(/var/lib/ceph/osd/ceph-7) _open_db erroring opening db:
OSD 8:
2021-12-12T21:20:11.141+0100 7fd1ccf01700 -1 received signal: Terminated from /sbin/init (PID: 1) UID: 0
2021-12-12T21:20:11.141+0100 7fd1ccf01700 -1 osd.8 2591 *** Got signal Terminated ***
2021-12-12T21:20:11.141+0100 7fd1ccf01700 -1 osd.8 2591 *** Immediate shutdown (osd_fast_shutdown=true) ***
2021-12-12T21:21:41.881+0100 7f6d18d2bf00 0 set uid:gid to 64045:64045 (ceph:ceph)
2021-12-12T21:21:41.881+0100 7f6d18d2bf00 0 ceph version 16.2.6 (1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable), process ceph-osd, pid 1938
:...
2021-12-12T21:21:44.577+0100 7f6d18d2bf00 -1 rocksdb: Corruption: Bad table magic number: expected 9863518390377041911, found 0 in db/002182.sst
2021-12-12T21:21:44.577+0100 7f6d18d2bf00 -1 bluestore(/var/lib/ceph/osd/ceph-8) _open_db erroring opening db:
Best regards,
Sebastian
--
Igor Fedotov
Ceph Lead Developer
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx