> -----Original Message----- > From: abperiasamy@xxxxxxxxx [mailto:abperiasamy@xxxxxxxxx] On Behalf Of > Anand Babu Periasamy > Sent: Wednesday, April 11, 2012 10:13 AM > To: 7220022 > Cc: gluster-devel@xxxxxxxxxx > Subject: Re: GlusterFS Spare Bricks? > > On Tue, Apr 10, 2012 at 1:39 AM, 7220022 <7220022@xxxxxxxxx> wrote: > > > > Are there plans to add provisioning of spare bricks in a replicated (or > distributed-replicated) configuration? E.g., when a brick in a mirror set > dies, the system rebuilds it automatically on a spare, similar to how it'd > done by RAID controllers. > > > > > > > > Nor would it only improve the practical reliability, especially of large > clusters, but it'd also make it possible to make better-performing > clusters > off less expensive components. For example, instead of having slow RAID5 > bricks on expensive RAID controllers one uses cheap HBA-s and stripes a > few > disks per brick in RAID0 - that's faster for writes than RAID 5/6 by an > order of magnitude (and, by the way, should improve rebuild times in > Gluster > many are complaining about.). A failure of one such striped brick is not > catastrophic in a mirrored Gluster - but it's better to have spare bricks > standing by strewn across cluster heads. > > > > > > > > A more advanced setup at a hardware level involves creating "hybrid > > disks" > whereas HDD vdisks are cached by enterprise-class SSD-s. It works > beautifully and makes HDD-s amazingly fast for random transactions. The > technology's become widely available for many $500 COTS controllers. > However, it is not widely known that the results with HDD-s in RAID0 under > SSD cache are 10 to 20 (!!) times better than with RAID 5 or 6. > > > > > > > > There is no way to use RAID0 in commercial storage, the main reason > > being > the absence of hot-spares. If on the other hand the spares are handled by > Gluster in a form of (cached hardware-RAID0) pre-fabricated bricks both > very > good performance and reasonably sufficient redundancy should be easily > achieved. > > Why not use "gluster volume replace-brick ..." command. You can use > external > monitoring/management tools (eg. freeipmi) to detect node failures and > trigger replace brick through a script. GlusterFS has the mechanism for > hot > spare, but the policy should be external. > > [AS] That should work, but still it'd be prone to human error. In our > experience, if we've not had hotspares (block storage) we'd have surely > experienced catastrophic failures. First-off, COTS disks (and > controllers, > if we talk GlusterFS nodes) have a break-in period when the bad ones fail > under load within a few months. Secondly, a lot of our equipment is in > remote telco facilities where power, cleanliness or airconditioning can be > far from ideal - leading to increasing failure rates about 2 years after > deployment. As a rule, we have at least 4 hotspares per two 24-bay > enclosures, while our sister company with similar use profile does 4-6 > spares per enclosure, as they run older and less uniform equipment. > > A node may come back online in 5 mins, GlusterFS should not automatically > make decisions. > [AS] Good point, e.g. down for maintenance > > I am thinking if it makes sense to add hot-spare as a standard feature, > because GlusterFS detects failures. > > [AS] Given the reason above it'd be best if the feature could be turned on > and off. Before attempting maintenance - turn off. Maintenance complete > and node up - the "turn hotspare on" command is issued, but it's queued > until the reconstruction of the node begins - and takes it into > consideration (won't attempt to sync to spare bricks in case > reconstruction > to other good bricks has already began). > > In half the cases, failed disks and controllers fail randomly and > temporarily (due to dust, bad power etc.) Most of the time the root cause > is unknown or is impractical to debug in a live system. Block storage > SAN-s > have more or less standard configuration tools that also take that into > account. Here's a brief description in their terminology, which may help > creating the logic in GlusterFS: > > 1. Drives can have the statuses of Online, Unconfigured Good, Unconfigured > Bad, Spare (LSP, a spare local to the drive group,) Global Spare (GSP, > across the system) and Foreign. > 2. vDisks can be Optimal, Degraded and Degraded, Rebuilding > 3. In presence of spares, if a drive in a redundant vDisk fails the > system > marks the drive as Unconfigured Bad and the vDisk picks up the spare and > enters the Rebuilding mode. > 4. The system won't let you make an Unconfigured Bad drive Online. But > you > can try a "make unconfigured good" command on it. if successful, and it > passes initialization and it won't show trouble in SMART - include it in a > new vDisk, make it a spare, etc. If it's bad - replace it. > Very useful points. Took notes. -ab