Hi, everyone.
Here is the output of the status command for the volume
and the peers:
[root@web1 ~]# gluster volume status
Status of volume: share
Gluster process TCP Port RDMA
Port Online Pid
------------------------------------------------------------------------------
Brick c10839:/gluster 49152
0 Y 540
Brick c10840:/gluster 49152
0 Y 533
Brick web3:/gluster 49152
0 Y 782
Self-heal Daemon on localhost N/A
N/A Y 602
Self-heal Daemon on web3 N/A
N/A Y 790
Self-heal Daemon on web4 N/A
N/A Y 636
Self-heal Daemon on web2 N/A
N/A Y 523
Task Status of Volume share
------------------------------------------------------------------------------
There are no active volume tasks
[root@web1 ~]# gluster peer status
Number of Peers: 3
Hostname: web3
Uuid: b138b4d5-8623-4224-825e-1dfdc3770743
State: Peer in Cluster (Connected)
Hostname: web2
Uuid: b3926959-3ae8-4826-933a-4bf3b3bd55aa
State: Peer in Cluster (Connected)
Other names:
c10840.sgvps.net
Hostname: web4
Uuid: f7553cba-c105-4d2c-8b89-e5e78a269847
State: Peer in Cluster (Connected)
All in all, we have three servers that are servers and
actually store the data and one server which is just a peer
and is connected to one of the other servers.
We suspect that the issue is related to the self-heal
daemons but we are not sure. Could you please advice how to
debug this issue and what could be causing the whole cluster
to go down. If it is the self-heal as we suspect do you
think it is ok to disable it. If some of the settings are
causing this problem could you please advice how to
configure the cluster to avoid this problem.