Typical setup questions

B.Candler at pobox.com (Brian Candler) · Fri, 24 Aug 2012 22:28:16 +0100

On Fri, Aug 24, 2012 at 10:51:24AM -0500, Matt Weil wrote:
> I am curious what is used typically for the file system replication
> and how do you make sure that it is consistent.
> 
> So for example when using large 3TB+ sata/NL-sas drives.  Is is
> typical to replicate three times to get similar protection to raid
> 6?

Gluster sits on top of existing filesystems on the storage bricks, so it's
fine to continue to use RAID10 (for performance) or RAID6 (for capacity) on
those nodes.  Gluster replicated volumes, and/or gluster geo-replication,
then give you an additional layer of replication on top of that, and the
ability to handle entire servers going out of service.

If I were you, I would not want to have a non-resilient array like a RAID0
on my storage bricks.

Whilst in principle you could have lots of separate 3TB filesystems and put
them into a large distributed/replicated set, I think this is likely to be
difficult to manage.  In particular, the process of replacing a failed disk
requires more skill than a simple RAID drive swap.

One word of warning: when choosing 3TB SATA drives, ensure they support
error recovery control (a.k.a. time-limited error recovery).

Enterprise drives do, but many consumer ones don't. The Hitachi consumer
ones do, for now anyway; Seagate ones do not.

To attempt to enable it on a particular drive:

    # smartctl -l scterc,70,70 /dev/sda

If the drive supports it, you'll see:

    SCT Error Recovery Control set to:
               Read:     70 (7.0 seconds)
              Write:     70 (7.0 seconds)

There's plenty of discussion on the linux-raid mailing list if you want to
go through the archives.

> Also what is typically done to ensure that all replicas are in place
> and consistent?  A cron that stats of ls's the file system from a
> single client?

I don't have a good answer to that. Stat'ing all files recursively used to
be required for gluster <3.3 to force healing.  As of gluster 3.3, there is
a self-healing daemon which handles this automatically.  So basically, you
trust gluster to do its job.

I guess there could be value in running a recursive md5sum on each replica
locally and comparing the results (but you'd have to allow for files which
were in the process of changing during the scan)

Regards,

Brian.