Hi all, My production setup also suffers from total unavailablility outages when self-heal gets real work to do. On a 4 server distributed-replicate 14x2 cluster where 1 server has been down for 2 days the volume becomes completely unresponsive when we bring the server back into the cluster. I ticketed it here : https://bugzilla.redhat.com/show_bug.cgi?id=963223 "Re-inserting a server in a v3.3.2qa2 distributed-replicate volume DOSes the volume" Does anyone know of a way to slow down self-heal so that it does not make the volume unresponsive ? The "unavailability due to high load caused by gluster itself" pattern repeats itself in several cases : https://bugzilla.redhat.com/show_bug.cgi?id=950024 replace-brick immediately saturates IO on source brick causing the entire volume to be unavailable, then dies https://bugzilla.redhat.com/show_bug.cgi?id=950006 replace-brick activity dies, destination glusterfs spins at 100% CPU forever https://bugzilla.redhat.com/show_bug.cgi?id=832609 Glusterfsd hangs if brick filesystem becomes unresponsive, causing all clients to lock up https://bugzilla.redhat.com/show_bug.cgi?id=962875 Entire volume DOSes itself when a node reboots and runs fsck on its bricks while network is up https://bugzilla.redhat.com/show_bug.cgi?id=963223 Re-inserting a server in a v3.3.2qa2 distributed-replicate volume DOSes the volume There's probably more, but these are the ones that affected my servers. I also had to stop a rebalance action due to too high load on the above 3 out-of 4 servers cluster causing another service unavailablility outage. This might be related to 1 server being down as rebalance 'behaved' better before. I made no ticket for this yet. The pattern must really be fixed, rather sooner than later, as it makes running a production level service with gluster impossible. regards, Hans Lambermont Darren wrote on 20130514: > Thanks, it's always good to know I'm not alone with problem! Also good to > know I haven't missed something blindingly obvious in the config/setup. > > WE had our VPN drop between the DCs yesterday afternoon, which resulted in > high load on 1 gluster server at a time for about 10 minutes once the VPN > was back up, so unless anyone else has any ideas, I think looking at > alternatives is our only way forward. I had a quick look the other day and > Ceph was one of the possibilities that stood out for me. > > Thanks. > > > On 14 May 2013 03:21, Toby Corkindale > <toby.corkindale at strategicdata.com.au>wrote: > > > On 11/05/13 00:40, Matthew Day wrote: > > > >> Hi all, > >> > >> I'm pretty new to Gluster, and the company I work for uses it for > >> storage across 2 data centres. An issue has cropped up fairly recently > >> with regards to the self-heal mechanism. > >> > >> Occasionally the connection between these 2 Gluster servers breaks or > >> drops momentarily. Due to the nature of the business it's highly likely > >> that files have been written during this time. When the self-heal daemon > >> runs it notices a discrepancy and gets the volume up to date. The > >> problem we've been seeing is that this appears to cause the CPU load to > >> increase massively on both servers whilst the healing process takes place. > >> > >> After trying to find out if there were any persistent network issues I > >> tried recreating this on a test system and can now re-produce at will. > >> Our test system set up is made up of 3 VMs, 2 Gluster servers and a > >> client. The process to cause this was: > >> Add in an iptables rule to block one of the Gluster servers from being > >> reached by the other server and the client. > >> Create some random files on the client. > >> Flush the iptables rules out so the server is reachable again. > >> Force a self heal to run. > >> Watch as the load on the Gluster servers goes bananas. > >> > >> The problem with this is that whilst the self-heal happens one the > >> gluster servers will be inaccessible from the client, meaning no files > >> can be read or written, causing problems for our users. > >> > >> I've been searching for a solution, or at least someone else who has > >> been having the same problem and not found anything. I don't know if > >> this is a bug or config issue (see below for config details). I've tried > >> a variety of different options but none of them have had any effect. > >> > > > > > > For what it's worth.. I get this same behaviour, and our gluster servers > > aren't even in separate data centres. It's not always the self-heal daemon > > that triggers it -- sometimes the client gets in first. > > > > Either way -- while recovery occurs, the available i/o to clients drops to > > effectively nothing, and they stall until recovery completes. > > > > I believe this problem is most visible when your architecture contains a > > lot of small files per directory. If you can change your filesystem layout > > to avoid this, then you may not be hit as hard. > > (eg. Take an MD5 hash of the path and filename, then store the file under > > a subdirectory named after the first few characters in the hash. (2 chars > > will divide the files-per-directory by ~1300, three by ~47k) eg. > > "folder/file.dat" becomes "66/folder/file.dat") > > > > > > I've given up on GlusterFS though; have a look at Ceph and RiakCS if your > > systems suit Swift/S3 style storage. > > > > -Toby