So I assume we _are_ talking about bit-rot? > On 23 Nov 2015, at 18:37, Jose Tavares <jat@xxxxxxxxxxxx> wrote: > > Yes, but with SW-RAID, when we have a block that was read and does not > match its checksum, the device falls out of the array, and the data is read > again from the other devices in the array. That's not true. SW-RAID reads data from one drive only. Comparison of the data on different drives only happens when a check is executed, and that's doesn't help with bit-rot one bit :-) (the same goes for various SANs and arrays, but those usually employ additional CRC for data so their BER is orders of magnitude higher.) > The problem is that in SW-RAID1 > we don't have the badblocks isolated. The disks can be sincronized again as > the write operation is not tested. The problem (device falling out of the > array) will happen again if we try to read any other data written over the > bad block. Not true either. Bit-rot happens not (only) when the data gets written wrong, but when it is read. If you read one block long enough you will get wrong data once every $BER_bits. Rewriting the data doesn't help. (It's a bit different with some SSDs that don't refresh blocks so rewriting/refreshing them might help). > > My new question regarding Ceph is if it isolates this bad sectors where it > found bad data when scrubbing? or there will be always a replica of > something over a known bad block..? > > I also saw that Ceph use same metrics when capturing data from disks. When > the disk is resetting or have problems, its metrics are going to be bad and > the cluster will rank bad this osd. But I didn't saw any way of sending > alerts or anything like that. SW-RAID has its mdadm monitor that alerts > when things go bad. Should I have to be looking for ceph logs all the time > to see when things go bad? You should graph every drive and look for anomalies. Ceph only detects a problem when the drive is already very unusable (the ceph-osd process itself blocks for tens of seconds typically). CEPH is not really good when it comes to latency SLAs, no matter how much you try, but that's usually sufficient. > > Thanks. > Jose Tavares > > On Mon, Nov 23, 2015 at 3:19 PM, Robert LeBlanc <robert@xxxxxxxxxxxxx> > wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA256 >> >> Most people run their clusters with no RAID for the data disks (some >> will run RAID for the journals, but we don't). We use the scrub >> mechanism to find data inconsistency and we use three copies to do >> RAID over host/racks, etc. Unless you have a specific need, it is best >> to forgo the Linux SW RAID or even HW RAIDs too with Ceph. >> - ---------------- >> Robert LeBlanc >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >> >> >> On Mon, Nov 23, 2015 at 10:09 AM, Jose Tavares wrote: >>> Hi guys ... >>> >>> Is there any advantage in running CEPH over a Linux SW-RAID to avoid data >>> corruption due to disk bad blocks? >>> >>> Can we just rely on the scrubbing feature of CEPH? Can we live without an >>> underlying layer that avoids hardware problems to be passed to CEPH? >>> >>> I have a setup where I put one OSD per node and I have a 2 disk raid-1 >>> setup. Is it a good option or it would be better if I had 2 OSDs, one in >>> each disk? If I had one OSD per disk, I would have to increase the >> number os >>> replicas to guarantee enough replicas if one node goes down. >>> >>> Thanks a lot. >>> Jose Tavares >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> -----BEGIN PGP SIGNATURE----- >> Version: Mailvelope v1.2.3 >> Comment: https://www.mailvelope.com >> >> wsFcBAEBCAAQBQJWU0qBCRDmVDuy+mK58QAAczAP/RducnXBNyeESCwUP/RC >> 3ELmoZxMO2ymrcQoutUVXfPTZk7f9pINUux4NRnglbVDxHasmNBHFKV3uWTS >> OBmaVuC99cwG/ekhmNaW9qmQIZiP8byijoDln26eqarhhuMECgbYxZhLtB9M >> A1W5gpKEvCBvYcjW9V/rwb0+V678Eo1IVlezwJ1TP3pxvRWpDsg1dIhOBit8 >> PznnPTMS46RGFrFirTg1AfvmipSI3rhLFdR2g7xHrQs9UHdmC0OQ/Jcjnln+ >> L0LNni7ht1lK80J9Mk4Q/nt7HfWCxJrg497Q+R0m+ab3qFJWBUGwofjbEnut >> JroMLph0sxAzmDSst8a15pzTYaIqMqKkGfGeHgiaNzePwELAY2AKwgx2AIlf >> iYJCtyiXRHnfQfQEi1TflWFuEaaAhKCPqRO7Duf6a+rEsSkvViaZ9Mtm1bSX >> KnLLSz8ZtXI4wTWbImXbpdhuGgHvKsEGWlU+YDuCil9i+PedM67us1Y6TAsT >> UWvCd8P385psITLI37Ly+YDHphjyeyYljCPGuom1e+/J3flElS/BgWUGUibB >> rA3QUNUIPWKO6F37JEDja13BShTE9I17Y3EpSgGGG3jnTt93/E4dEvR6mC/F >> qPPjs7EMvc99Xi7rTqtpm58JLGXWh3rMgjITJTwfLhGtCHgSvvrsRjmGB9Xa >> anPK >> =XQGP >> -----END PGP SIGNATURE----- >> > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com