Re: Gluster infrastructure question

Dan Mons <dmons@xxxxxxxxxxxxxxxxxx> · Tue, 10 Dec 2013 07:58:25 +1000

I went with big RAID on each node (16x 3TB SATA disks in RAID6 with a
hot spare per node) rather than brick-per-disk.  The simple reason
being that I wanted to configure distribute+replicate at the GlusterFS
level, and be 100% guaranteed that the replication happened across to
another node, and not to another brick on the same node.  As each node
only has one giant brick, the cluster is forced to replicate to a
separate node each time.

Some careful initial setup could probably have done the same, but I
wanted to avoid the dramas of my employer expanding the cluster one
node at a time later on, causing that design goal to fail as the new
single node with many bricks found replication partners on itself.

On a different topic, I find no real-world difference in RAID10 to
RAID6 with GlusterFS.  Most of the access delay in Gluster has little
to do with the speed of the disk.  The only downside to RAID6 is a
long rebuild time if you're unlucky enough to blow a couple of drives
at once.  RAID50 might be a better choice if you're up at 20 drives
per node.

We invested in SSD caching on our nodes, and to be honest it was
rather pointless.  Certainly not bad, but the real-world speed boost
is not noticed by end users.

-Dan

----------------
Dan Mons
R&D SysAdmin
Unbreaker of broken things
Cutting Edge
http://cuttingedge.com.au

On 10 December 2013 05:31, Ben Turner <bturner@xxxxxxxxxx> wrote:
> ----- Original Message -----
>> From: "Ben Turner" <bturner@xxxxxxxxxx>
>> To: "Heiko Krämer" <hkraemer@xxxxxxxxxxx>
>> Cc: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
>> Sent: Monday, December 9, 2013 2:26:45 PM
>> Subject: Re:  Gluster infrastructure question
>>
>> ----- Original Message -----
>> > From: "Heiko Krämer" <hkraemer@xxxxxxxxxxx>
>> > To: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
>> > Sent: Monday, December 9, 2013 8:18:28 AM
>> > Subject:  Gluster infrastructure question
>> >
>> > -----BEGIN PGP SIGNED MESSAGE-----
>> > Hash: SHA1
>> >
>> > Heyho guys,
>> >
>> > I'm running since years glusterfs in a small environment without big
>> > problems.
>> >
>> > Now I'm going to use glusterFS for a bigger cluster but I've some
>> > questions :)
>> >
>> > Environment:
>> > * 4 Servers
>> > * 20 x 2TB HDD, each
>> > * Raidcontroller
>> > * Raid 10
>> > * 4x bricks => Replicated, Distributed volume
>> > * Gluster 3.4
>> >
>> > 1)
>> > I'm asking me, if I can delete the raid10 on each server and create
>> > for each HDD a separate brick.
>> > In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is there
>> > any experience about the write throughput in a production system with
>> > many of bricks like in this case? In addition i'll get double of HDD
>> > capacity.
>>
>> Have a look at:
>>
>> http://rhsummit.files.wordpress.com/2012/03/england-rhs-performance.pdf
>
> That one was from 2012, here is the latest:
>
> http://rhsummit.files.wordpress.com/2013/07/england_th_0450_rhs_perf_practices-4_neependra.pdf
>
> -b
>
>> Specifically:
>>
>> ● RAID arrays
>> ● More RAID LUNs for better concurrency
>> ● For RAID6, 256-KB stripe size
>>
>> I use a single RAID 6 that is divided into several LUNs for my bricks.  For
>> example, on my Dell servers(with PERC6 RAID controllers) each server has 12
>> disks that I put into raid 6.  Then I break the RAID 6 into 6 LUNs and
>> create a new PV/VG/LV for each brick.  From there I follow the
>> recommendations listed in the presentation.
>>
>> HTH!
>>
>> -b
>>
>> > 2)
>> > I've heard a talk about glusterFS and out scaling. The main point was
>> > if more bricks are in use, the scale out process will take a long
>> > time. The problem was/is the Hash-Algo. So I'm asking me how is it if
>> > I've one very big brick (Raid10 20TB on each server) or I've much more
>> > bricks, what's faster and is there any issues?
>> > Is there any experiences ?
>> >
>> > 3)
>> > Failover of a HDD is for a raid controller with HotSpare HDD not a big
>> > deal. Glusterfs will rebuild automatically if a brick fails and there
>> > are no data present, this action will perform a lot of network traffic
>> > between the mirror bricks but it will handle it equal as the raid
>> > controller right ?
>> >
>> >
>> >
>> > Thanks and cheers
>> > Heiko
>> >
>> >
>> >
>> > - --
>> > Anynines.com
>> >
>> > Avarteq GmbH
>> > B.Sc. Informatik
>> > Heiko Krämer
>> > CIO
>> > Twitter: @anynines
>> >
>> > - ----
>> > Geschäftsführer: Alexander Faißt, Dipl.-Inf.(FH) Julian Fischer
>> > Handelsregister: AG Saarbrücken HRB 17413, Ust-IdNr.: DE262633168
>> > Sitz: Saarbrücken
>> > -----BEGIN PGP SIGNATURE-----
>> > Version: GnuPG v1.4.14 (GNU/Linux)
>> > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>> >
>> > iQEcBAEBAgAGBQJSpcMfAAoJELxFogM4ixOF/ncH/3L9DvOWHrF0XBqCgeT6QQ6B
>> > lDwtXiD9xoznht0Zs2S9LA9Z7r2l5/fzMOUSOawEMv6M16Guwq3gQ1lClUi4Iwj0
>> > GKKtYQ6F4aG4KXHY4dlu1QKT5OaLk8ljCQ47Tc9aAiJMhfC1/IgQXOslFv26utdJ
>> > N9jxiCl2+r/tQvQRw6mA4KAuPYPwOV+hMtkwfrM4UsIYGGbkNPnz1oqmBsfGdSOs
>> > TJh6+lQRD9KYw72q3I9G6ZYlI7ylL9Q7vjTroVKH232pLo4G58NLxyvWvcOB9yK6
>> > Bpf/gRMxFNKA75eW5EJYeZ6EovwcyCAv7iAm+xNKhzsoZqbBbTOJxS5zKm4YWoY=
>> > =bDly
>> > -----END PGP SIGNATURE-----
>> >
>> > _______________________________________________
>> > Gluster-users mailing list
>> > Gluster-users@xxxxxxxxxxx
>> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users@xxxxxxxxxxx
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users