CRUSH map advice

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 14, 2014 at 12:47 AM, Christian Balzer <chibi at gol.com> wrote:
>
> Hello,
>
> On Tue, 12 Aug 2014 10:53:21 -0700 Craig Lewis wrote:
>
>> That's a low probability, given the number of disks you have.  I would've
>> taken that bet (with backups).  As the number of OSDs goes up, the
>> probability of multiple simultaneous failures goes up, and slowly
>> becomes a bad bet.
>>
>
> I must be very unlucky then. ^o^
> As in, I've had dual disk failures in a set of 8 disks 3 times now
> (within the last 6 years).
> And twice that lead to data loss, once with RAID5 (no surprise there) and
> once with RAID10 (unlucky failure of neighboring disks).
> Granted, that was with consumer HDDs and the last one with rather well
> aged ones, too. But there you go.

Yeah, I'd say you're unlucky, unless you're running a pretty large cluster.
 I usually run my 8 disk arrays in RAID-Z2 / RAID6 though; 5 disks is my
limit for RAID-Z1 / RAID5.

I've been lucky so far.  No double failures in my RAID-Z1 / RAID5 arrays,
and no triple failures in my RAID-Z2 / RAID6 arrays.  After 15 years and
hundreds of arrays, I should've had at least one.  I have had several
double failures in RAID1, but none of those were important.


If this isn't a big cluster, I would suspect that you have a vibration or
power issue.  Both are known to cause premature death in HDDs.  Of course,
rebuilding a degraded RAID is also a well known cause of premature HDD
death.



> As for backups, those are for when somebody does something stupid and
> deletes stuff they shouldn't have.
> A storage system should be a) up all the time and b) not loose data.


I completely agree, but never trust it.

Over the years, I've used backups to recover when:

   - I do something stupid
   - My developers do something stupid
   - Hardware does something stupid
   - Manufacturer firmware does something stupid
   - Manufacturer Tech support tells me to do something stupid
   - My datacenter does something stupid
   - My power companies do something stupid

I've lost data from a software RAID0, all the way up to a
quadruply-redundant multi-million dollar hardware storage array.
 Regardless of the promises printed on the box, it's the contingency plans
that keep the paychecks coming.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140814/9a709048/attachment.htm>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux