Hello, although I don't know much about this topic, I believe that ceph erasure encoding will probably solve a lot of these issues with some speed tradeoff. With erasure encoding the replicated data eats way less disk capacity, so you could use a higher replication factor with a lower disk usage tradeoff. Wolfgang On 12/19/2013 09:39 AM, Christian Balzer wrote: > > Hello, > > In my "Sanity check" thread I postulated yesterday that to get the same > redundancy and resilience for disk failures (excluding other factors) as > my proposed setup (2 nodes, 2x 11 3TB HDs RAID6 per node, 2 > global hotspares, thus 4 OSDs) the "Ceph way" one would need need something > like 6 nodes with 10 3TB HDs each, 3 way replication (to protect against > dual disk failures) to get the similar capacity and a 7th identical node to > allow for node failure/maintenance. > > That was basically based on me thinking "must not get caught be a dual > disk failure ever again", as that happened twice to me, once with a RAID5 > and the expected consequences, once with a RAID10 where I got lucky (8 > disks total each time). > > However something was nagging me at the back of my brain and turned out to > be my long forgotten statistics classes in school. ^o^ > > So I after reading some articles basically telling the same things I found > this: https://www.memset.com/tools/raid-calculator/ > > Now this is based on assumptions, onto which I will add some more, but the > last sentence on that page still is quite valid. > > So lets compare these 2 configurations above, I assumed 75GB/s recovery > speed for the RAID6 configuration something I've seen in practice. > Basically that's half speed, something that will be lower during busy hours > and higher during off peak hours. I made the same assumption for Ceph with > a 10Gb/s network, assuming 500GB/s recovery/rebalancing speeds. > The rebalancing would have to compete with other replication traffic > (likely not much of an issue) and the actual speed/load of the individual > drives involved. Note that if we assume a totally quiet setup, were 100% > of all resources would be available for recovery the numbers would of > course change, but NOT their ratios. > I went with the default disk lifetime of 3 years and 0 day replacement > time. The latter of course gives very unrealistic results for anything w/o > hotspare drive, but we're comparing 2 different beasts here. > > So that all said, the results of that page that make sense in this > comparison are the RAID6 +1 hotspare numbers. As in, how likely is a 3rd > drive failure in the time before recovery is complete, the replacement > setting of 0 giving us the best possible number and since one would deploy > a Ceph cluster with sufficient extra capacity that's what we shall use. > > For the RAID6 setup (12 HDs total) this gives us a pretty comfortable > 1 in 58497.9 ratio of data loss per year. > Alas for the 70 HDs in the comparable Ceph configuration we wind up with > just a 1 in 13094.31 ratio, which while still quite acceptable clearly > shows where this is going. > > So am I completely off my wagon here? > How do people deal with this when potentially deploying hundreds of disks > in a single cluster/pool? > > I mean, when we get too 600 disks (and that's just one rack full, OK, > maybe 2 due to load and other issues ^o^) of those 4U 60 disk storage > servers (or 72 disk per 4U if you're happy with killing another drive when > replacing a faulty one in that Supermicro contraption), that ratio is down > to 1 in 21.6 which is way worse than that 8disk RAID5 I mentioned up there. > > Regards, > > Christian > -- DI (FH) Wolfgang Hennerbichler Software Development Unit Advanced Computing Technologies RISC Software GmbH A company of the Johannes Kepler University Linz IT-Center Softwarepark 35 4232 Hagenberg Austria Phone: +43 7236 3343 245 Fax: +43 7236 3343 250 wolfgang.hennerbichler@xxxxxxxxxxxxxxxx http://www.risc-software.at _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com