Re: CEPH over SW-RAID

Jose Tavares <jat@xxxxxxxxxxxx> · Mon, 23 Nov 2015 15:37:39 -0200

Yes, but with SW-RAID, when we have a block that was read and does not match its checksum, the device falls out of the array, and the data is read again from the other devices in the array. The problem is that in SW-RAID1 we don't have the badblocks isolated. The disks can be sincronized again as the write operation is not tested. The problem (device falling out of the array) will happen again if we try to read any other data written over the bad block.

My new question regarding Ceph is if it isolates this bad sectors where it found bad data when scrubbing? or there will be always a replica of something over a known bad block..?

I also saw that Ceph use same metrics when capturing data from disks. When the disk is resetting or have problems, its metrics are going to be bad and the cluster will rank bad this osd. But I didn't saw any way of sending alerts or anything like that. SW-RAID has its mdadm monitor that alerts when things go bad. Should I have to be looking for ceph logs all the time to see when things go bad?

Thanks.
Jose Tavares

On Mon, Nov 23, 2015 at 3:19 PM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
-----BEGIN PGP SIGNED MESSAGE-----

Hash: SHA256

Most people run their clusters with no RAID for the data disks (some

will run RAID for the journals, but we don't). We use the scrub

mechanism to find data inconsistency and we use three copies to do

RAID over host/racks, etc. Unless you have a specific need, it is best

to forgo the Linux SW RAID or even HW RAIDs too with Ceph.

- ----------------

Robert LeBlanc

PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Mon, Nov 23, 2015 at 10:09 AM, Jose Tavares  wrote:

> Hi guys ...

>

> Is there any advantage in running CEPH over a Linux SW-RAID to avoid data

> corruption due to disk bad blocks?

>

> Can we just rely on the scrubbing feature of CEPH? Can we live without an

> underlying layer that avoids hardware problems to be passed to CEPH?

>

> I have a setup where I put one OSD per node and I have a 2 disk raid-1

> setup. Is it a good option or it would be better if I had 2 OSDs, one in

> each disk? If I had one OSD per disk, I would have to increase the number os

> replicas to guarantee enough replicas if one node goes down.

>

> Thanks a lot.

> Jose Tavares

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

-----BEGIN PGP SIGNATURE-----

Version: Mailvelope v1.2.3

Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWU0qBCRDmVDuy+mK58QAAczAP/RducnXBNyeESCwUP/RC

3ELmoZxMO2ymrcQoutUVXfPTZk7f9pINUux4NRnglbVDxHasmNBHFKV3uWTS

OBmaVuC99cwG/ekhmNaW9qmQIZiP8byijoDln26eqarhhuMECgbYxZhLtB9M

A1W5gpKEvCBvYcjW9V/rwb0+V678Eo1IVlezwJ1TP3pxvRWpDsg1dIhOBit8

PznnPTMS46RGFrFirTg1AfvmipSI3rhLFdR2g7xHrQs9UHdmC0OQ/Jcjnln+

L0LNni7ht1lK80J9Mk4Q/nt7HfWCxJrg497Q+R0m+ab3qFJWBUGwofjbEnut

JroMLph0sxAzmDSst8a15pzTYaIqMqKkGfGeHgiaNzePwELAY2AKwgx2AIlf

iYJCtyiXRHnfQfQEi1TflWFuEaaAhKCPqRO7Duf6a+rEsSkvViaZ9Mtm1bSX

KnLLSz8ZtXI4wTWbImXbpdhuGgHvKsEGWlU+YDuCil9i+PedM67us1Y6TAsT

UWvCd8P385psITLI37Ly+YDHphjyeyYljCPGuom1e+/J3flElS/BgWUGUibB

rA3QUNUIPWKO6F37JEDja13BShTE9I17Y3EpSgGGG3jnTt93/E4dEvR6mC/F

qPPjs7EMvc99Xi7rTqtpm58JLGXWh3rMgjITJTwfLhGtCHgSvvrsRjmGB9Xa

anPK

=XQGP

-----END PGP SIGNATURE-----

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com