Gluster-users Digest, Vol 20, Issue 22

harsha at gluster.com (Harshavardhana) · Wed, 6 Jan 2010 09:18:48 +0530

Hi Liam,

* replies inline *
On Wed, Jan 6, 2010 at 7:07 AM, Liam Slusser <lslusser at gmail.com> wrote:

> Arvids & Larry,
>
> Interesting read, Arvids.  And i completely agree.  On our large raid6
> array it takes 1 week to rebuild the array from any ONE drive failure.
>  Its a scary time when doing a rebuild because of the decreased
> performance from the array and the increased chance of a full raid
> failure if we lost another two drives.  Makes for a very long week of
> nail biting.
>
> Larry brought up some great points.  I, also, have been burned way to
> many times by raid5 and only use it if i absolutely have to.  I
> normally stick to raid1/10 or raid6/60.  Even with my huge raid6
> rebuild time of a week, its still faster to do that then have gluster
> resync everything.  The raid rebuild does affect the performance of
> the box, however, so would a gluster rebuild.
>
> As for Larry's point #4 i duplicate the data across two boxes using
> cluster/replication on top of raid6.  So each box has a large raid6
> set and then dup the data between the two.  So, for whatever reason,
> if i did loose a whole raid array i can still recover with Gluster.
>
> I've also been frowned on for using desktop drives in our servers -
> but on the bright side i've had very little problems with them.  Of
> course it did take buying a bunch of different raid cards and drives
> before finding a combination that played well together.  We currently
> have 240 Seagate 1.5tb desktop drives in our two gluster clusters and
> have only had to replaced three in the last year - two that just died
> and one started to get smart errors so it was replaced.  I haven't had
> a problem getting Seagate to replace the drives - as they fail i ship
> them off to Seagate and they send me a new one.  I did figure we would
> have to do support in house so we bought lots of spare parts when we
> ordered everything.  It was still way cheaper to buy desktop drives
> and Supermicro servers with lots of spare parts than shopping at Dell,
> HP or Sun - by more than half.
>
> Honestly my biggest peeve of Gluster is the rebuild process.  Take the
> OneFS file system in Isilon clusters - they are able to rebuild at a
> block level - only replicating information which has changed.  So even
> with one node being offline all day - a rebuild/resync operation is
> very quick.  And have 30 billion files or 10 huge ones makes no
> difference on resync speed.  While with Gluster a huge directory
> tree/number of files can take days if not weeks to finish.  Of course
> being that Gluster runs on top of a normal filesystem such as
> xfs/ext3/zfs having access to block level replication may be tricky.
> I honestly would not be against having the Gluster team modifying the
> xfs/ext3/whatever filesystem so they could tailer it more for their
> own needs - which of course would make it far less portable and much
> more difficult to install and configure...
>
> GlusterFS does checksum based self-heal since the 3.0 release, i would
believe your experiences are from 2.0? which has issues of doing a full file
self-heal which will a lot of time.  But i would suggest an upgrade with
3.0.1
release which is due Feb 1st week for your cluster. 3.x releases with new
self-heal you should get very less rebuild times. If its possible to compare
the
3.0.1 rebuild times with the One-FS from Isilon should help us improve it
too.

Thanks

> Whatever the solution is i can tell you that the rebuild issues will
> only get worse as drives continue to get larger and the number of
> files/directories continue to grow.  Sun's ZFS filesystem goes along
> way to fix some of these problems, i just wish they would port it over
> to Linux.
>
I would suggest wait for "brtfs".

>
> liam
>
> On Tue, Jan 5, 2010 at 2:17 PM, Arvids Godjuks <arvids.godjuks at gmail.com>
> wrote:
> > Consider this - a rebuild of 1.5-2 TB HDD in raid5/6 array can easily
> > take up to few days to complete. At that moment your storage at that
> > node will not perform well. I read a week ago very good article with
> > research of this area, only thing it's in russian, but it mentions a
> > few english sources too. Maybe google translate will help.
> > Here's the original link: http://habrahabr.ru/blogs/hardware/78311/
> > Here's the google translate version:
> >
> http://translate.google.com/translate?js=y&prev=_t&hl=en&ie=UTF-8&layout=1&eotf=1&u=http%3A%2F%2Fhabrahabr.ru%2Fblogs%2Fhardware%2F78311%2F&sl=ru&tl=en
> > (looks quite neet by the way)
> >
> > 2010/1/5 Liam Slusser <lslusser at gmail.com>:
> >> Larry & All,
> >>
> >> I would much rather rebuild a bad drive with a raid controller then
> >> have to wait for Gluster to do it.  With a large number of files doing
> >> a ls -aglR can take weeks.  Also you don't NEED enterprise drives with
> >> a raid controller, i use desktop 1.5tb Seagate drives which happy as a
> >> clam on a 3ware SAS card under a SAS expander.
> >>
> >> liam
> >>
> >>
> >
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>