On 13 January 2016 at 12:26, Magnus Hagdorn <magnus.hagdorn@xxxxxxxx> wrote: > Hi there, > we recently had a problem with two OSDs failing because of I/O errors of the > underlying disks. We run a small ceph cluster with 3 nodes and 18 OSDs in > total. All 3 nodes are dell poweredge r515 servers with PERC H700 (MegaRAID > SAS 2108) RAID controllers. All disks are configured as single disk RAID 0 > arrays. A disk on two separate nodes started showing I/O errors reported by > SMART, with one of the disks reporting pre failure SMART error. The node > with the failing disk also reported XFS I/O errors. In both cases the OSD > daemons kept running although ceph reported that they were slow to respond. > When we started to look into this we first tried restarted the OSDs. They > then failed straight away. We ended up with data loss. We are running ceph > 0.80.5 on Scientific Linux 6.6 with a replication level of 2. We had hoped > that loosing disks due to hardware failure would be recoverable. > > Is this a known issue with the RAID controllers, version of ceph? If you have a replication level of 2, and lose 2 disks from different nodes simultaneously, you're going to get data loss. Some portion of your data will have its primary copy on disk A (in node 1) and the backup copy on disk B (in node 2) (and some more data will have the primary copy on B and backup on A) - if you lose A and B at the same time then there's no other copies for those bits of data. If you only lost one disk (e.g. A) then ceph would shuffle things around and duplicate the data from the backup copy, so that (after recovery) you have two copies again. Ceph also makes sure that the copies are on different nodes, in case you lose an entire node - but in this case, you've lost two disks on separate nodes. If you want to tolerate two simultaneous disk failures across your cluster, then you need to have 3 copies of your data (or an appropriate erasure-coding setup). Thanks, Andy _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com