Le 17/12/2015 10:51, Nicolas Ecarnot a écrit :
Le 17/12/2015 10:10, Nicolas Ecarnot a écrit :
Hello,
Our setup : 3 Centos 7.2 nodes, with gluster 3.7.6 in replica-3, used as
storage+compute for an oVirt 3.5.6 DC.
Two days ago, we added some nagios/centreon monitoring watching every 5
minutes the state of the heal queue :
(something like "gluster volume heal some_vol info" with the adequate
grep).
I expected the "Number of entries" of every node to appear in the graph
as a flat zero line, most of the times, except for the rare cases of
node reboot, after which healing is launched and takes some minutes
(sometimes hours) but is doing good.
Instead, we see that the healing queue is doing 2 or 3 files healing say
4 times an hour. All day long.
Our DC is a small one, and has few VMs, so not more than only 8 big
files are stored in glusterfs.
I'm very surprised to see that these files constantly need healing, as I
thought I've understood that read/writes were synchronous at every time,
and replica-3 meant that every files were absolutely synced and commited
at all time.
I've also read about the 10 minutes cron-like job of the self-healing
daemon, which we are using by default, but this is a second point.
The first point leads to :
- Why do we see so frequent desynchronizations between nodes?
- Can I confirm that reading which logs?
- What must I check?
Self-replying, but as I found :
https://www.mail-archive.com/gluster-users%40gluster.org/msg20611.html
could this make sense to be surprised to see that :
gluster volume get data cluster.op-version
Option Value
------ -----
cluster.op-version 30600
in a 3.7.6 gluster cluster?
Ok, cluster.op-version bumped up, but no improvement.
Opening https://bugzilla.redhat.com/show_bug.cgi?id=1294675
--
Nicolas ECARNOT
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users