Self-heal and high load

hans at shapeways.com (Hans Lambermont) · Thu, 16 May 2013 11:54:29 +0200

Hi all,

My production setup also suffers from total unavailablility outages when
self-heal gets real work to do. On a 4 server distributed-replicate 14x2
cluster where 1 server has been down for 2 days the volume becomes
completely unresponsive when we bring the server back into the cluster.

I ticketed it here : https://bugzilla.redhat.com/show_bug.cgi?id=963223
"Re-inserting a server in a v3.3.2qa2 distributed-replicate volume DOSes
the volume"

Does anyone know of a way to slow down self-heal so that it does not
make the volume unresponsive ?

The "unavailability due to high load caused by gluster itself" pattern
repeats itself in several cases :

https://bugzilla.redhat.com/show_bug.cgi?id=950024 replace-brick
immediately saturates IO on source brick causing the entire volume to be
unavailable, then dies

https://bugzilla.redhat.com/show_bug.cgi?id=950006 replace-brick
activity dies, destination glusterfs spins at 100% CPU forever

https://bugzilla.redhat.com/show_bug.cgi?id=832609 Glusterfsd hangs if
brick filesystem becomes unresponsive, causing all clients to lock up

https://bugzilla.redhat.com/show_bug.cgi?id=962875 Entire volume DOSes
itself when a node reboots and runs fsck on its bricks while network is up

https://bugzilla.redhat.com/show_bug.cgi?id=963223 Re-inserting a server
in a v3.3.2qa2 distributed-replicate volume DOSes the volume

There's probably more, but these are the ones that affected my servers.

I also had to stop a rebalance action due to too high load on the above
3 out-of 4 servers cluster causing another service unavailablility
outage. This might be related to 1 server being down as rebalance
'behaved' better before. I made no ticket for this yet.

The pattern must really be fixed, rather sooner than later, as it makes
running a production level service with gluster impossible.

regards,
   Hans Lambermont

Darren wrote on 20130514:

> Thanks, it's always good to know I'm not alone with problem! Also good to
> know I haven't missed something blindingly obvious in the config/setup.
> 
> WE had our VPN drop between the DCs yesterday afternoon, which resulted in
> high load on 1 gluster server at a time for about 10 minutes once the VPN
> was back up, so unless anyone else has any ideas, I think looking at
> alternatives is our only way forward. I had a quick look the other day and
> Ceph was one of the possibilities that stood out for me.
> 
> Thanks.
> 
> 
> On 14 May 2013 03:21, Toby Corkindale
> <toby.corkindale at strategicdata.com.au>wrote:
> 
> > On 11/05/13 00:40, Matthew Day wrote:
> >
> >> Hi all,
> >>
> >> I'm pretty new to Gluster, and the company I work for uses it for
> >> storage across 2 data centres. An issue has cropped up fairly recently
> >> with regards to the self-heal mechanism.
> >>
> >> Occasionally the connection between these 2 Gluster servers breaks or
> >> drops momentarily. Due to the nature of the business it's highly likely
> >> that files have been written during this time. When the self-heal daemon
> >> runs it notices a discrepancy and gets the volume up to date. The
> >> problem we've been seeing is that this appears to cause the CPU load to
> >> increase massively on both servers whilst the healing process takes place.
> >>
> >> After trying to find out if there were any persistent network issues I
> >> tried recreating this on a test system and can now re-produce at will.
> >> Our test system set up is made up of 3 VMs, 2 Gluster servers and a
> >> client. The process to cause this was:
> >> Add in an iptables rule to block one of the Gluster servers from being
> >> reached by the other server and the client.
> >> Create some random files on the client.
> >> Flush the iptables rules out so the server is reachable again.
> >> Force a self heal to run.
> >> Watch as the load on the Gluster servers goes bananas.
> >>
> >> The problem with this is that whilst the self-heal happens one the
> >> gluster servers will be inaccessible from the client, meaning no files
> >> can be read or written, causing problems for our users.
> >>
> >> I've been searching for a solution, or at least someone else who has
> >> been having the same problem and not found anything. I don't know if
> >> this is a bug or config issue (see below for config details). I've tried
> >> a variety of different options but none of them have had any effect.
> >>
> >
> >
> > For what it's worth.. I get this same behaviour, and our gluster servers
> > aren't even in separate data centres. It's not always the self-heal daemon
> > that triggers it -- sometimes the client gets in first.
> >
> > Either way -- while recovery occurs, the available i/o to clients drops to
> > effectively nothing, and they stall until recovery completes.
> >
> > I believe this problem is most visible when your architecture contains a
> > lot of small files per directory. If you can change your filesystem layout
> > to avoid this, then you may not be hit as hard.
> > (eg. Take an MD5 hash of the path and filename, then store the file under
> > a subdirectory named after the first few characters in the hash. (2 chars
> > will divide the files-per-directory by ~1300, three by ~47k) eg.
> > "folder/file.dat" becomes "66/folder/file.dat")
> >
> >
> > I've given up on GlusterFS though; have a look at Ceph and RiakCS if your
> > systems suit Swift/S3 style storage.
> >
> > -Toby