lost OSD due to failing disk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi there,
we recently had a problem with two OSDs failing because of I/O errors of the underlying disks. We run a small ceph cluster with 3 nodes and 18 OSDs in total. All 3 nodes are dell poweredge r515 servers with PERC H700 (MegaRAID SAS 2108) RAID controllers. All disks are configured as single disk RAID 0 arrays. A disk on two separate nodes started showing I/O errors reported by SMART, with one of the disks reporting pre failure SMART error. The node with the failing disk also reported XFS I/O errors. In both cases the OSD daemons kept running although ceph reported that they were slow to respond. When we started to look into this we first tried restarted the OSDs. They then failed straight away. We ended up with data loss. We are running ceph 0.80.5 on Scientific Linux 6.6 with a replication level of 2. We had hoped that loosing disks due to hardware failure would be recoverable.

Is this a known issue with the RAID controllers, version of ceph?

Regards
magnus


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux