http://www.google.ru/url?sa=t&rct=j&q=gluster%20virtual%20storage%20applianc e%20infiniband%20ssd%20performance&source=web&cd=26&ved=0CFsQFjAFOBQ&url=htt p%3A%2F%2Fwww.lighthouse-partners.com%2Flinux%2Fpresentations09%2FHPL09-Sess ion6.pdf&ei=c12HT8WtForgtQa79Jy-BA&usg=AFQjCNFwOz2DTSWvSLQETiXtR2Qy-szOPA&ca d=rjt from page 27, discussion of storage redundancy issues - could be useful too. -----Original Message----- From: abperiasamy@xxxxxxxxx [mailto:abperiasamy@xxxxxxxxx] On Behalf Of Anand Babu Periasamy Sent: Thursday, April 12, 2012 7:56 PM To: 7220022 Cc: gluster-devel@xxxxxxxxxx Subject: Re: GlusterFS Spare Bricks? > -----Original Message----- > From: abperiasamy@xxxxxxxxx [mailto:abperiasamy@xxxxxxxxx] On Behalf > Of Anand Babu Periasamy > Sent: Wednesday, April 11, 2012 10:13 AM > To: 7220022 > Cc: gluster-devel@xxxxxxxxxx > Subject: Re: GlusterFS Spare Bricks? > > On Tue, Apr 10, 2012 at 1:39 AM, 7220022 <7220022@xxxxxxxxx> wrote: > > > > Are there plans to add provisioning of spare bricks in a replicated > > (or > distributed-replicated) configuration? E.g., when a brick in a mirror > set dies, the system rebuilds it automatically on a spare, similar to > how it'd done by RAID controllers. > > > > > > > > Nor would it only improve the practical reliability, especially of > > large > clusters, but it'd also make it possible to make better-performing > clusters off less expensive components. For example, instead of having > slow RAID5 bricks on expensive RAID controllers one uses cheap HBA-s > and stripes a few disks per brick in RAID0 - that's faster for writes > than RAID 5/6 by an order of magnitude (and, by the way, should > improve rebuild times in Gluster many are complaining about.). A > failure of one such striped brick is not catastrophic in a mirrored > Gluster - but it's better to have spare bricks standing by strewn > across cluster heads. > > > > > > > > A more advanced setup at a hardware level involves creating "hybrid > > disks" > whereas HDD vdisks are cached by enterprise-class SSD-s. It works > beautifully and makes HDD-s amazingly fast for random transactions. > The technology's become widely available for many $500 COTS controllers. > However, it is not widely known that the results with HDD-s in RAID0 > under SSD cache are 10 to 20 (!!) times better than with RAID 5 or 6. > > > > > > > > There is no way to use RAID0 in commercial storage, the main reason > > being > the absence of hot-spares. If on the other hand the spares are > handled by Gluster in a form of (cached hardware-RAID0) pre-fabricated > bricks both very good performance and reasonably sufficient redundancy > should be easily achieved. > > Why not use "gluster volume replace-brick ..." command. You can use > external monitoring/management tools (eg. freeipmi) to detect node > failures and trigger replace brick through a script. GlusterFS has the > mechanism for hot spare, but the policy should be external. > > [AS] That should work, but still it'd be prone to human error. In our > experience, if we've not had hotspares (block storage) we'd have > surely experienced catastrophic failures. First-off, COTS disks (and > controllers, if we talk GlusterFS nodes) have a break-in period when > the bad ones fail under load within a few months. Secondly, a lot of > our equipment is in remote telco facilities where power, cleanliness > or airconditioning can be far from ideal - leading to increasing > failure rates about 2 years after deployment. As a rule, we have at > least 4 hotspares per two 24-bay enclosures, while our sister company > with similar use profile does 4-6 spares per enclosure, as they run > older and less uniform equipment. > > A node may come back online in 5 mins, GlusterFS should not > automatically make decisions. > [AS] Good point, e.g. down for maintenance > > I am thinking if it makes sense to add hot-spare as a standard > feature, because GlusterFS detects failures. > > [AS] Given the reason above it'd be best if the feature could be > turned on and off. Before attempting maintenance - turn off. > Maintenance complete and node up - the "turn hotspare on" command is > issued, but it's queued until the reconstruction of the node begins - > and takes it into consideration (won't attempt to sync to spare bricks > in case reconstruction to other good bricks has already began). > > In half the cases, failed disks and controllers fail randomly and > temporarily (due to dust, bad power etc.) Most of the time the root > cause is unknown or is impractical to debug in a live system. Block > storage SAN-s have more or less standard configuration tools that also > take that into account. Here's a brief description in their > terminology, which may help creating the logic in GlusterFS: > > 1. Drives can have the statuses of Online, Unconfigured Good, > Unconfigured Bad, Spare (LSP, a spare local to the drive group,) > Global Spare (GSP, across the system) and Foreign. > 2. vDisks can be Optimal, Degraded and Degraded, Rebuilding 3. In > presence of spares, if a drive in a redundant vDisk fails the system > marks the drive as Unconfigured Bad and the vDisk picks up the spare > and enters the Rebuilding mode. > 4. The system won't let you make an Unconfigured Bad drive Online. > But you can try a "make unconfigured good" command on it. if > successful, and it passes initialization and it won't show trouble in > SMART - include it in a new vDisk, make it a spare, etc. If it's bad > - replace it. > Very useful points. Took notes. -ab