On 12/20/2021 2:58 PM, Andrej Filipcic wrote:
On 12/20/21 12:47, Igor Fedotov wrote:
Thanks for the info.
Just in case - is write caching disabled for the disk in question?
What's the output for "hdparm -W </path-to-disk-dev>" ?
no, it is enabled. Shall I disable that on all OSDs?
I can't tell you for sure if this is the root cause. Generally upstream
recommends to disable write caching due to multiple performance issues
we observed. I don't recall any one about data corruption though. But
still can imagine something like that. On the other hand as far as I
could see from the initial log there were rather no node reboot/shutdown
on upgrade hence hardware write caching is unlikely to be involved. Am
I right about no node shutdown in you case?
And it would be an interesting experiment whether it data corruption is
related indeed. So it would be great if you can test that...
One more question please - is this a bare metal deployment or
containerized (Rook?) one?
And I presume OSD restart is a rare event in your cluster, isn't it?
That's why you probably haven't faced the issue before...
Thanks in advance,
--
Igor Fedotov
Ceph Lead Developer
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx