I'm new to Gluster, and have some questions

daemons at kanuka.com.au (Daniel Mons) · Sat, 23 Oct 2010 00:37:32 +1000

On Fri, Oct 22, 2010 at 10:55 AM, Horacio Sanson <hsanson at gmail.com> wrote:
> Distributed volume: ?Aggregates the storage of several directories (bricks in
> gluster terms) among several computers. The benefit is that you ?can
> grow/shrink the volume as you please. The bad part is that ?this offers no
> performance/reliability guarantees as files are ?stored randomly among the
> disks in the volume.
>
> Replicated volume: Requires minimum 2 bricks in separate servers. All files are
> replicated among the bricks. How many replicas can be configured at volume
> creation. Has all the benefits of a Distributed volume plus fail resilience.
>
> Stripe volume: Requires minimum 2 bricks in separate servers. All files are
> splitted in stripes and these stripes are distributed among the bricks of the
> volume. How many stripes and which size is configured on volume creation. Has
> all the benefits of Replicated volume plus reliability and can improve read
> performance for large files as the read is distributed among several machines.

2 comments:

1) Stripe by itself offers no redundancy.  You mention that it has
"all the benefits of replication" - it actually doesn't.  If you use
only stripe and lose a brick, your data is corrupt (say you have 4
nodes and 1 is lost, you only have 3/4 of every file stored, which is
pretty useless to you).  Consider this something akin to RAID0.

2) You can, however, mix and match these translators to your
convenience.  I'm designing a site at the moment where pairs of nodes
are set up in replicate, and then overall all data is striped over
each replicate pair.   This is somewhat like the concept of RAID10.

To answer the original poster's question of "how does the data spread
itself?", well that's up to you.   My design is to have replicate
pairs, and stripe across many of these.  You could instead do the
reverse, and have striped pairs which all data would replicate over.
If you think about it, the latter ends up with less usable storage and
no real speed gain.  The former ensures that as new storage bricks are
added, data is striped across more pairs, and the overall speed
benefit is greater.

One thing to consider also is that striping means your data is broken
into chunks and spread around the cluster.  Should something go awry
(either physically or logically), then your data could potentially be
lost.  The "distribute" translator is slightly safer in this regard.
If worst comes to worst and you suffer either a logical or physical
error destroying part of your data, it's a simple task to just
manually mount up the underlying file system and recover at least some
of your data (as bricks store only whole files).

With that in mind, the "stripe" translator is best suited to sites
where very large files are accessed frequently by many clients.  I'm
planning it for a site where a few 1TB files need to be read in by 30
clients quasi-simultaneously.  Starting each client off at slightly
different times (even a few seconds apart) means they should
theoretically be reading different chunks from different bricks, and
the overall bandwidth of the cluster will not bottleneck at any one
point.  Compare this to a single NFS server with all 30 clients
smashing it for the same file, and GlusterFS with stripe is clearly a
better option.

If your site has many clients accessing relatively small files (even
up to a few hundred MB each) in an ad-hoc fashion, then "distribute"
is a much safer bet.  You'd most likely end up with as good
performance as "stripe" site-wide, and have the added benefit of being
able to manually recover files from a brick should something go wrong.

"Distribute" is certainly my pick for your average business that has
lots of unstructured data in the form of documents, images and the
like.  Ditto for large file stores for things like web farms and
whatnot.  As above, I'd only consider stripe where VERY large files
are accessed by many clients at the same time, and speed is of the
essence.

-Dan