Gluster endless heal

Mahdi Adnan <mahdi.adnan@xxxxxxxxxxx> · Wed, 17 Jan 2018 18:50:51 +0000

Hi,

I have an issue with Gluster 3.8.14.

The cluster is 4 nodes with replica count 2, on of the nodes went offline for around 15 minutes, when it came back online, self heal triggered and it just did not stop afterward, it's been running for 3 days now, maxing the bricks utilization without actually
 healing anything.

The bricks are all SSDs, and the logs of the source node is spamming with the following messages; 

[2018-01-17 18:37:11.815247] I [MSGID: 108026] [afr-self-heal-common.c:1254:afr_log_selfheal] 0-ovirt_imgs-replicate-0: Completed data selfheal on 450fb07a-e95d-48ef-a229-48917557c278. sources=[0]  sinks=1 
[2018-01-17 18:37:12.830887] I [MSGID: 108026] [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do] 0-ovirt_imgs-replicate-0: performing metadata selfheal on ce0f545d-635a-40c0-95eb-ccfc71971f78
[2018-01-17 18:37:12.845978] I [MSGID: 108026] [afr-self-heal-common.c:1254:afr_log_selfheal] 0-ovirt_imgs-replicate-0: Completed metadata selfheal on ce0f545d-635a-40c0-95eb-ccfc71971f78. sources=[0]  sinks=1

---

I tried restarting glusterd and rebooting the node after about 24 hours of healing, but it just did not help, i had like several bricks doing heal and after rebooting it's now only 4 bricks doing heal.

The volume is used for oVirt storage domain with sharding enabled.

No errors or warnings on both nodes, just info messages about afr healing.

any idea whats going on or where should i start looking ?

-- 

Respectfully

Mahdi A. Mahdi

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users