Gluster-users Digest, Vol 20, Issue 22

lslusser at gmail.com (Liam Slusser) · Tue, 5 Jan 2010 17:37:09 -0800

Arvids & Larry,

Interesting read, Arvids.  And i completely agree.  On our large raid6
array it takes 1 week to rebuild the array from any ONE drive failure.
 Its a scary time when doing a rebuild because of the decreased
performance from the array and the increased chance of a full raid
failure if we lost another two drives.  Makes for a very long week of
nail biting.

Larry brought up some great points.  I, also, have been burned way to
many times by raid5 and only use it if i absolutely have to.  I
normally stick to raid1/10 or raid6/60.  Even with my huge raid6
rebuild time of a week, its still faster to do that then have gluster
resync everything.  The raid rebuild does affect the performance of
the box, however, so would a gluster rebuild.

As for Larry's point #4 i duplicate the data across two boxes using
cluster/replication on top of raid6.  So each box has a large raid6
set and then dup the data between the two.  So, for whatever reason,
if i did loose a whole raid array i can still recover with Gluster.

I've also been frowned on for using desktop drives in our servers -
but on the bright side i've had very little problems with them.  Of
course it did take buying a bunch of different raid cards and drives
before finding a combination that played well together.  We currently
have 240 Seagate 1.5tb desktop drives in our two gluster clusters and
have only had to replaced three in the last year - two that just died
and one started to get smart errors so it was replaced.  I haven't had
a problem getting Seagate to replace the drives - as they fail i ship
them off to Seagate and they send me a new one.  I did figure we would
have to do support in house so we bought lots of spare parts when we
ordered everything.  It was still way cheaper to buy desktop drives
and Supermicro servers with lots of spare parts than shopping at Dell,
HP or Sun - by more than half.

Honestly my biggest peeve of Gluster is the rebuild process.  Take the
OneFS file system in Isilon clusters - they are able to rebuild at a
block level - only replicating information which has changed.  So even
with one node being offline all day - a rebuild/resync operation is
very quick.  And have 30 billion files or 10 huge ones makes no
difference on resync speed.  While with Gluster a huge directory
tree/number of files can take days if not weeks to finish.  Of course
being that Gluster runs on top of a normal filesystem such as
xfs/ext3/zfs having access to block level replication may be tricky.
I honestly would not be against having the Gluster team modifying the
xfs/ext3/whatever filesystem so they could tailer it more for their
own needs - which of course would make it far less portable and much
more difficult to install and configure...

Whatever the solution is i can tell you that the rebuild issues will
only get worse as drives continue to get larger and the number of
files/directories continue to grow.  Sun's ZFS filesystem goes along
way to fix some of these problems, i just wish they would port it over
to Linux.

liam

On Tue, Jan 5, 2010 at 2:17 PM, Arvids Godjuks <arvids.godjuks at gmail.com> wrote:
> Consider this - a rebuild of 1.5-2 TB HDD in raid5/6 array can easily
> take up to few days to complete. At that moment your storage at that
> node will not perform well. I read a week ago very good article with
> research of this area, only thing it's in russian, but it mentions a
> few english sources too. Maybe google translate will help.
> Here's the original link: http://habrahabr.ru/blogs/hardware/78311/
> Here's the google translate version:
> http://translate.google.com/translate?js=y&prev=_t&hl=en&ie=UTF-8&layout=1&eotf=1&u=http%3A%2F%2Fhabrahabr.ru%2Fblogs%2Fhardware%2F78311%2F&sl=ru&tl=en
> (looks quite neet by the way)
>
> 2010/1/5 Liam Slusser <lslusser at gmail.com>:
>> Larry & All,
>>
>> I would much rather rebuild a bad drive with a raid controller then
>> have to wait for Gluster to do it. ?With a large number of files doing
>> a ls -aglR can take weeks. ?Also you don't NEED enterprise drives with
>> a raid controller, i use desktop 1.5tb Seagate drives which happy as a
>> clam on a 3ware SAS card under a SAS expander.
>>
>> liam
>>
>>
>