Hello,
I created a post a few days ago named "Turning Off Self Heal Options Don't Appear Work?" which can be found at the following link: http://www.gluster.org/pipermail/gluster-users/2015-January/020114.html
I never got a response so I decided to set up a test in a lab environment. I am able to reproduce the same thing so I'm hoping someone can help me.
I have discovered over time that if a single node in a 3-node replicated cluster with many small files is off for any length of time, when it comes back on-line, it does a great deal of self-healing that can cause the glusterfs and glusterfsd processes to spike on the machines to a degree that makes them unusable. I only have one volume, with a client mount on each server where it hosts many websites running PHP. All is fine until the healing process goes into overdrive.
So, I attempted to turn off self-healing by setting the following three settings:
gluster volume set gv0 cluster.data-self-heal off
gluster volume set gv0 cluster.entry-self-heal off
gluster volume set gv0 cluster.metadata-self-heal off
Note that I would rather not set gv0 cluster.self-heal-daemon off as then I can't see what needs healing such that I can do it at a later time. Those settings appear to have no affect at all.
Here is how I reproduced this in my lab:
Output from "gluster volume info gv0":
Volume Name: gv0
Type: Replicate
Volume ID: a55f8619-0789-4a1c-9cda-a903bc908fd1
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.1.116:/export/brick1
Brick2: 192.168.1.140:/export/brick1
Brick3: 192.168.1.123:/export/brick1
Options Reconfigured:
cluster.metadata-self-heal: off
cluster.entry-self-heal: off
cluster.data-self-heal: off
This was done using the latest version of gluster as of this writing, v3.6.1 installed on CentOS 6.6 using the rpms available from the gluster web site.
Here is how I tested:
- With all 3 nodes up, I put 4 simple text files on the cluster
- I then turned one node off
- Next I made a change to 2 of the text files
- Then I brought the previously turned off node back up
Upon doing so, I see far more than 2 of the following message in the glusterhd.log:
[2015-01-15 23:19:30.471384] I [afr-self-heal-entry.c:545:afr_selfheal_entry_do] 0-gv0-replicate-0: performing entry selfheal on 00000000-0000-0000-0000-000000000001
[2015-01-15 23:19:30.494714] I [afr-self-heal-common.c:476:afr_log_selfheal] 0-gv0-replicate-0: Completed entry selfheal on 00000000-0000-0000-0000-000000000001. source=0 sinks=
Questions:
- So is this a bug?
- Why am I seeing "entry selfheal" messaages when this feature is supposed to be turned off?
- Also, why am I seeing far more selfheal messages than 2 when I only changed 2 files while the single node was down?
- Finally, how do I really turn off these selfheals that are taking place without completely turning off the cluster.self-heal-daemon for reasons mentioned above?
Thank you for any insight you may be able to provide on this.
Kyle
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users