Re: 16.2.7 pacific rocksdb Corruption: CURRENT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 12/20/2021 2:58 PM, Andrej Filipcic wrote:
On 12/20/21 12:47, Igor Fedotov wrote:

Thanks for the info.

Just in case - is write caching disabled for the disk in question? What's the output for "hdparm -W </path-to-disk-dev>" ?

no, it is enabled. Shall I disable that on all OSDs?

I can't tell you for sure if this is the root cause. Generally upstream recommends to disable write caching due to multiple performance issues we observed. I don't recall any one about data corruption though. But still can imagine something like that. On the other hand as far as I could see from the initial log there were rather no node reboot/shutdown on upgrade hence  hardware write caching is unlikely to be involved. Am I right about no node shutdown in you case?

And it would be an interesting experiment whether it data corruption is related indeed. So it would be great if you can test that...


One more question please - is this a bare metal deployment or containerized (Rook?) one?

And I presume OSD restart  is a rare event in your cluster, isn't it? That's why you probably haven't faced the issue before...

Thanks in advance,

--

Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux