Self-heal and high load

darren at delta-delta.co.uk (Darren) · Tue, 14 May 2013 10:18:52 +0100

Thanks, it's always good to know I'm not alone with problem! Also good to
know I haven't missed something blindingly obvious in the config/setup.

WE had our VPN drop between the DCs yesterday afternoon, which resulted in
high load on 1 gluster server at a time for about 10 minutes once the VPN
was back up, so unless anyone else has any ideas, I think looking at
alternatives is our only way forward. I had a quick look the other day and
Ceph was one of the possibilities that stood out for me.

Thanks.

On 14 May 2013 03:21, Toby Corkindale
<toby.corkindale at strategicdata.com.au>wrote:

> On 11/05/13 00:40, Matthew Day wrote:
>
>> Hi all,
>>
>> I'm pretty new to Gluster, and the company I work for uses it for
>> storage across 2 data centres. An issue has cropped up fairly recently
>> with regards to the self-heal mechanism.
>>
>> Occasionally the connection between these 2 Gluster servers breaks or
>> drops momentarily. Due to the nature of the business it's highly likely
>> that files have been written during this time. When the self-heal daemon
>> runs it notices a discrepancy and gets the volume up to date. The
>> problem we've been seeing is that this appears to cause the CPU load to
>> increase massively on both servers whilst the healing process takes place.
>>
>> After trying to find out if there were any persistent network issues I
>> tried recreating this on a test system and can now re-produce at will.
>> Our test system set up is made up of 3 VMs, 2 Gluster servers and a
>> client. The process to cause this was:
>> Add in an iptables rule to block one of the Gluster servers from being
>> reached by the other server and the client.
>> Create some random files on the client.
>> Flush the iptables rules out so the server is reachable again.
>> Force a self heal to run.
>> Watch as the load on the Gluster servers goes bananas.
>>
>> The problem with this is that whilst the self-heal happens one the
>> gluster servers will be inaccessible from the client, meaning no files
>> can be read or written, causing problems for our users.
>>
>> I've been searching for a solution, or at least someone else who has
>> been having the same problem and not found anything. I don't know if
>> this is a bug or config issue (see below for config details). I've tried
>> a variety of different options but none of them have had any effect.
>>
>
>
> For what it's worth.. I get this same behaviour, and our gluster servers
> aren't even in separate data centres. It's not always the self-heal daemon
> that triggers it -- sometimes the client gets in first.
>
> Either way -- while recovery occurs, the available i/o to clients drops to
> effectively nothing, and they stall until recovery completes.
>
> I believe this problem is most visible when your architecture contains a
> lot of small files per directory. If you can change your filesystem layout
> to avoid this, then you may not be hit as hard.
> (eg. Take an MD5 hash of the path and filename, then store the file under
> a subdirectory named after the first few characters in the hash. (2 chars
> will divide the files-per-directory by ~1300, three by ~47k) eg.
> "folder/file.dat" becomes "66/folder/file.dat")
>
>
> I've given up on GlusterFS though; have a look at Ceph and RiakCS if your
> systems suit Swift/S3 style storage.
>
> -Toby
> ______________________________**_________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.**org/mailman/listinfo/gluster-**users<http://supercolony.gluster.org/mailman/listinfo/gluster-users>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130514/49ed94a9/attachment.html>