Re: lost OSD due to failing disk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





2016-01-14 11:25 GMT+02:00 Magnus Hagdorn <magnus.hagdorn@xxxxxxxx>:
On 13/01/16 13:32, Andy Allan wrote:
On 13 January 2016 at 12:26, Magnus Hagdorn <magnus.hagdorn@xxxxxxxx> wrote:
Hi there,
we recently had a problem with two OSDs failing because of I/O errors of the
underlying disks. We run a small ceph cluster with 3 nodes and 18 OSDs in
total. All 3 nodes are dell poweredge r515 servers with PERC H700 (MegaRAID
SAS 2108) RAID controllers. All disks are configured as single disk RAID 0
arrays. A disk on two separate nodes started showing I/O errors reported by
SMART, with one of the disks reporting pre failure SMART error. The node
with the failing disk also reported XFS I/O errors. In both cases the OSD
daemons kept running although ceph reported that they were slow to respond.
When we started to look into this we first tried restarted the OSDs. They
then failed straight away. We ended up with data loss. We are running ceph
0.80.5 on Scientific Linux 6.6 with a replication level of 2. We had hoped
that loosing disks due to hardware failure would be recoverable.

Is this a known issue with the RAID controllers, version of ceph?
If you only lost one disk (e.g. A) then ceph would shuffle things
around and duplicate the data from the backup copy, so that (after
recovery) you have two copies again. Ceph also makes sure that the
copies are on different nodes, in case you lose an entire node - but
in this case, you've lost two disks on separate nodes.

AFAICT, the two failures were a few days apart. The main issue is that ceph didn't detect the failures. It *only* warned that there were two slowly responding OSDs. This is precisely our worry. How come ceph didn't detect and mitigate the failure.

I think that's because the OSDs weren't down, just slow on read/write (hence slow response). I think ceph takes actions only on no response from the OSD and marks it as down.
In any case, on replication level on 2 (pool size 3, 1 master and 2 copies of it) you should still have an integral copy of your data on the host that is still up. Setting the min_size to 1 on the pool should give you acces to that host, although not recommanded in case a HDD fails on that node too your data will be lost.
 
Cheers
magnus

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux