Re: Disk failures

Gandalf Corvotempesta <gandalf.corvotempesta@xxxxxxxxx> · Thu, 9 Jun 2016 08:43:23 +0200

Il 09 giu 2016 02:09, "Christian Balzer" <chibi@xxxxxxx> ha scritto:

> Ceph currently doesn't do any (relevant) checksumming at all, so if a

> PRIMARY PG suffers from bit-rot this will be undetected until the next

> deep-scrub.

>

> This is one of the longest and gravest outstanding issues with Ceph and

> supposed to be addressed with bluestore (which currently doesn't have

> checksum verified reads either).
So if bit rot happens on primary PG, ceph is spreading the currupted data across the cluster?

What would be sent to the replica,  the original data or the saved one?
When bit rot happens I'll have 1 corrupted object and 2 good.

how do you manage this between deep scrubs?  Which data would be used by ceph? I think that a bitrot on a huge VM block device could lead to a mess like the whole device corrupted 

VM affected by bitrot would be able to stay up and running? 

And bitrot on a qcow2 file? 
Let me try to explain: when writing to primary PG i have to write bit "1"

Due to a bit rot, I'm saving "0".

Would ceph read the wrote bit and spread that across the cluster (so it will spread "0") or spread the in memory value "1" ?
What if the journal fails during a read or a write? Ceph is able to recover by removing that journal from the affected osd (and still running at lower speed) or should i use a raid1 on ssds used by journal ? 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com