gluster rebuild time

lslusser at gmail.com (Liam Slusser) · Thu, 4 Feb 2010 17:25:26 -0800

All,

I've been asked to share some rebuild times on my large gluster
cluster.  I recently added another more storage (bricks) and did a
full ls -alR on the whole system.  I estimate we have around 50
million files and directories.

Gluster Server Hardware:

2 x Supermicro 4u chassis with 24 1.5tb SATA drives and another 24
1.5tb SATA drives in an external drive array via SAS (total of 96
drives all together), 8 core 2.5ghz xeon, 8gig ram
3ware raid controllers, 24 drives per raid6 array, 4 arrays total, 2
arrays per server
Centos 5.3 64bit
XFS with inode64 mount option
Gluster 2.0.9
Bonded gigabit ethernet

Clients:

20 or so Dell 1950 clients
Mixture of RedHat ES4 and Centos 5 clients + 20 Windows XP clients via
Samba (theses are VMs and do "have to run on windows" jobs)
All clients on gigabit ethernet

I must say that our load on our gluster servers is normally very high,
"load average" on the box is anywhere from 7-10 at peak (although
decent service times) - so im sure if we had a more idle system the
rebuild time would have been quicker.  The system is at its highest
load while writing a large amount of data while at peak of the day -
so i try to schedule jobs around our peak times.

Anyhow...

I started the job sometime January 16th and it JUST finished...18 days later.

real    27229m56.894s
user    13m19.833s
sys     56m51.277s

Finish date was Wed Feb  3 23:33:12 PST 2010

Now i've known some people have mentioned that Gluster is happier with
many bricks instead of larger raid arrays like I use however either
way id be stuck doing a ls -aglR which takes forever.  So id rather
add a huge amount of space at once and keep the system setup similar -
and let my 3ware controllers deal with drive failures instead of
having to do a ls -aglR each time i loose a drive.  Replacing a drive
with the 3ware controller 7 to 8 days in a 24 drive raid6 array but
thats better then 18 days for Gluster to do a ls -aglR.

By comparison our old 14 node Isilon 6000 cluster (6tb per node) did a
node rebuilt/resync in about a day or two - theres a big difference in
block level and file system level replication!

We're still running Gluster 2.0.9 but I am looking to upgrade to 3.0
once a few more releases are out and am hoping that the new checksum
based checks will speedup this whole process.  Once i have some
numbers on 3.0 ill be sure to share.

thanks,
liam