Re: lost OSD due to failing disk

Andy Allan <gravitystorm@xxxxxxxxx> · Wed, 13 Jan 2016 13:32:24 +0000

On 13 January 2016 at 12:26, Magnus Hagdorn <magnus.hagdorn@xxxxxxxx> wrote:
> Hi there,
> we recently had a problem with two OSDs failing because of I/O errors of the
> underlying disks. We run a small ceph cluster with 3 nodes and 18 OSDs in
> total. All 3 nodes are dell poweredge r515 servers with PERC H700 (MegaRAID
> SAS 2108) RAID controllers. All disks are configured as single disk RAID 0
> arrays. A disk on two separate nodes started showing I/O errors reported by
> SMART, with one of the disks reporting pre failure SMART error. The node
> with the failing disk also reported XFS I/O errors. In both cases the OSD
> daemons kept running although ceph reported that they were slow to respond.
> When we started to look into this we first tried restarted the OSDs. They
> then failed straight away. We ended up with data loss. We are running ceph
> 0.80.5 on Scientific Linux 6.6 with a replication level of 2. We had hoped
> that loosing disks due to hardware failure would be recoverable.
>
> Is this a known issue with the RAID controllers, version of ceph?

If you have a replication level of 2, and lose 2 disks from different
nodes simultaneously, you're going to get data loss. Some portion of
your data will have its primary copy on disk A (in node 1) and the
backup copy on disk B (in node 2) (and some more data will have the
primary copy on B and backup on A) - if you lose A and B at the same
time then there's no other copies for those bits of data.

If you only lost one disk (e.g. A) then ceph would shuffle things
around and duplicate the data from the backup copy, so that (after
recovery) you have two copies again. Ceph also makes sure that the
copies are on different nodes, in case you lose an entire node - but
in this case, you've lost two disks on separate nodes.

If you want to tolerate two simultaneous disk failures across your
cluster, then you need to have 3 copies of your data (or an
appropriate erasure-coding setup).

Thanks,
Andy
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com