Typical setup questions

mweil at genome.wustl.edu (Matt Weil) · Tue, 28 Aug 2012 10:29:55 -0500

Brian

thanks for this response.

Since we are on the subject of hardware what would be the perfect fit 
for a gluster brick. We where looking at a PowerEdge C2100 Rack Server.

During testing I found it pretty easy to saturate 1 Gig network links. 
This was also the case when multiple links where bonded together.  Are 
there any cheap 10 gig switch alternatives that anyone would suggest?

Matt

On 8/24/12 4:28 PM, Brian Candler wrote:
> On Fri, Aug 24, 2012 at 10:51:24AM -0500, Matt Weil wrote:
>> I am curious what is used typically for the file system replication
>> and how do you make sure that it is consistent.
>>
>> So for example when using large 3TB+ sata/NL-sas drives.  Is is
>> typical to replicate three times to get similar protection to raid
>> 6?
>
> Gluster sits on top of existing filesystems on the storage bricks, so it's
> fine to continue to use RAID10 (for performance) or RAID6 (for capacity) on
> those nodes.  Gluster replicated volumes, and/or gluster geo-replication,
> then give you an additional layer of replication on top of that, and the
> ability to handle entire servers going out of service.
>
> If I were you, I would not want to have a non-resilient array like a RAID0
> on my storage bricks.
>
> Whilst in principle you could have lots of separate 3TB filesystems and put
> them into a large distributed/replicated set, I think this is likely to be
> difficult to manage.  In particular, the process of replacing a failed disk
> requires more skill than a simple RAID drive swap.
>
> One word of warning: when choosing 3TB SATA drives, ensure they support
> error recovery control (a.k.a. time-limited error recovery).
>
> Enterprise drives do, but many consumer ones don't. The Hitachi consumer
> ones do, for now anyway; Seagate ones do not.
>
> To attempt to enable it on a particular drive:
>
>      # smartctl -l scterc,70,70 /dev/sda
>
> If the drive supports it, you'll see:
>
>      SCT Error Recovery Control set to:
>                 Read:     70 (7.0 seconds)
>                Write:     70 (7.0 seconds)
>
> There's plenty of discussion on the linux-raid mailing list if you want to
> go through the archives.
>
>> Also what is typically done to ensure that all replicas are in place
>> and consistent?  A cron that stats of ls's the file system from a
>> single client?
>
> I don't have a good answer to that. Stat'ing all files recursively used to
> be required for gluster <3.3 to force healing.  As of gluster 3.3, there is
> a self-healing daemon which handles this automatically.  So basically, you
> trust gluster to do its job.
>
> I guess there could be value in running a recursive md5sum on each replica
> locally and comparing the results (but you'd have to allow for files which
> were in the process of changing during the scan)
>
> Regards,
>
> Brian.
>