-----Original Message----- From: gluster-devel-bounces+7220022=gmail.com@xxxxxxxxxx [mailto:gluster-devel-bounces+7220022=gmail.com@xxxxxxxxxx] On Behalf Of Gordan Bobic Sent: Tuesday, April 10, 2012 3:45 PM To: gluster-devel@xxxxxxxxxx Subject: Re: GlusterFS Spare Bricks? On 10/04/2012 09:39, 7220022 wrote: > Are there plans to add provisioning of spare bricks in a replicated > (or > distributed-replicated) configuration? E.g., when a brick in a mirror > set dies, the system rebuilds it automatically on a spare, similar to > how it'd done by RAID controllers. > > Nor would it only improve the practical reliability, especially of > large clusters, but it'd also make it possible to make > better-performing clusters off less expensive components. For example, > instead of having slow RAID5 bricks on expensive RAID controllers one > uses cheap HBA-s and stripes a few disks per brick in RAID0 - that's > faster for writes than RAID 5/6 by an order of magnitude (and, by the > way, should improve rebuild times in Gluster many are complaining > about.).A failure of one such striped brick is not catastrophic in a > mirrored Gluster - but it's better to have spare bricks standing by strewn across cluster heads. > > A more advanced setup at a hardware level involves creating "hybrid > disks" whereas HDD vdisks are cached by enterprise-class SSD-s.It > works beautifully and makes HDD-s amazingly fast for random > transactions.The technology's become widely available for many $500 > COTS controllers.However, it is not widely known that the results with > HDD-s in RAID0 under SSD cache are 10 to 20 (!!) times better than > with RAID 5 or 6. On reads the difference should be negligible unless the array is degraded. If it's not, your RAID controller is unfit for purpose. [AS] I refer to random IOPS in 70K to 200K range on vdisks in RAID 0 vs. 5 behind large SSD cache. Behavior of such "hybrid vdisks" is different from pure SSD or HDD-based ones. Unlike that of the DDR RAM cache, the total R+W bandwidth in MB/s of an SSD is limited at the level of its max. read-only performance. Hence the front-end read performance is degraded by the value of the (sequential) write load onto the cache upstream from the HDD. And vice versa, the write performance of the hybrid gets degraded by the slow write speed of a RAID 5/6 array behind cache - especially at larger queue depths. These limitations, when superposed by most "real-world" test patterns leave the array just marginally better for both writes and reads than an HDD-based RAID10 one with the same number of drives. Not quite sure why, but it's removing the write speed limit of the HDD-s by changing the RAID level from 5 to 0 that clears the bottleneck. The relative difference gets much higher for both reads and writes than the write performance gap between pure HDD RAID 0 and 5 vdisks. Having said that, a lot of RAID controllers are pretty useless. [AS] the newer LSI 2208-based ones seem okay and recent firmware/drivers finally stable. But I agree: we always leave out RAID features apart from stripe or mirror and do everything by software. Advanced features (FastPath, CacheCade) though are fantastic if you use SSD-s, either standalone or as HDD cache. In fact we use controllers instead of simple HBA-s only to take advantage of these features. > There is no way to use RAID0 in commercial storage, the main reason > being the absence of hot-spares.If on the other hand the spares are > handled by Gluster in a form of (cached hardware-RAID0) pre-fabricated > bricks both very good performance and reasonably sufficient redundancy > should be easily achieved. So why not use ZFS instead? The write performance is significantly better than traditional RAID equivalents and you get vastly more flexibility than with any hardware RAID solution. And it supports caching data onto SSDs. [AS] Good point. We have no experience though, but we should try. Do you know if it can be made distributed "parallel" such as Gluster and supports RDM transport for storage traffic between heads? The main reason we've been looking into Gluster is cheap bandwidth: all our servers and nodes are connected via 40Gbit IB fabric, 2 ports per server, 4 on some larger ones, non-blocking edge switches, directors at floor level etc - 80 to 90% idle. Can you make global spares in ZFS? Gordan _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxx https://lists.nongnu.org/mailman/listinfo/gluster-devel