Re: 16.2.7 pacific rocksdb Corruption: CURRENT

Igor Fedotov <igor.fedotov@xxxxxxxx> · Mon, 20 Dec 2021 15:14:10 +0300

On 12/20/2021 2:58 PM, Andrej Filipcic wrote:
On 12/20/21 12:47, Igor Fedotov wrote:

Thanks for the info.

Just in case - is write caching disabled for the disk in question? 
What's the output for "hdparm -W </path-to-disk-dev>" ?

no, it is enabled. Shall I disable that on all OSDs?

I can't tell you for sure if this is the root cause. Generally upstream 
recommends to disable write caching due to multiple performance issues 
we observed. I don't recall any one about data corruption though. But 
still can imagine something like that. On the other hand as far as I 
could see from the initial log there were rather no node reboot/shutdown 
on upgrade hence  hardware write caching is unlikely to be involved. Am 
I right about no node shutdown in you case?

And it would be an interesting experiment whether it data corruption is 
related indeed. So it would be great if you can test that...

One more question please - is this a bare metal deployment or 
containerized (Rook?) one?

And I presume OSD restart  is a rare event in your cluster, isn't it? 
That's why you probably haven't faced the issue before...

Thanks in advance,

--

Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx