From gl4.dump file:
glusterd.peer4.hostname=gl5
glusterd.peer4.port=0
glusterd.peer4.state=3
glusterd.peer4.quorum-action=""
glusterd.peer4.quorum-contrib=2
glusterd.peer4.detaching=0
glusterd.peer4.locked=0
glusterd.peer4.rpc.peername=
glusterd.peer4.rpc.connected=0 <===== this indicates the gl5 is not connected with gl4, so add-brick command failed as it supposed to in this case
glusterd.peer4.rpc.total-bytes-read=0
glusterd.peer4.rpc.total-bytes-written=0
glusterd.peer4.rpc.ping_msgs_sent=0
glusterd.peer4.rpc.msgs_sent=0
And the same goes true for gl6 as well as per this dump. So the issue is with gl4 node.glusterd.peer4.hostname=gl5
glusterd.peer4.port=0
glusterd.peer4.state=3
glusterd.peer4.quorum-action=""
glusterd.peer4.quorum-contrib=2
glusterd.peer4.detaching=0
glusterd.peer4.locked=0
glusterd.peer4.rpc.peername=
glusterd.peer4.rpc.connected=0 <===== this indicates the gl5 is not connected with gl4, so add-brick command failed as it supposed to in this case
glusterd.peer4.rpc.total-bytes-read=0
glusterd.peer4.rpc.total-bytes-written=0
glusterd.peer4.rpc.ping_msgs_sent=0
glusterd.peer4.rpc.msgs_sent=0
Now, in gl4's glusterd log I see the repetitive entries of following logs:
[2016-12-13 16:35:31.438462] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl5
[2016-12-13 16:35:33.440155] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl6
[2016-12-13 16:35:34.441639] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl5
[2016-12-13 16:35:36.454546] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl6
[2016-12-13 16:35:37.456062] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl5
[2016-12-13 16:35:31.438462] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl5
[2016-12-13 16:35:33.440155] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl6
[2016-12-13 16:35:34.441639] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl5
[2016-12-13 16:35:36.454546] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl6
[2016-12-13 16:35:37.456062] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl5
The above indicates that gl4 is not able to resolve the DNS name for gl5 & gl6 where as in gl5 & gl6 it could resolve for gl4. Please check your DNS configuration and see if there are any incorrect entries put up there. From our side what we need to check is why peer status didn't show both gl5 & gl6 as disconnected.
On Wed, Dec 14, 2016 at 12:44 AM, Cedric Lemarchand <yipikai7@xxxxxxxxx> wrote:
Thanks Atin, the files you asked : https://we.tl/XrOvFhffGqOn 13 Dec 2016, at 19:08, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:Thanks, we will get back on this. In the mean time can you please also share glusterd statedump file from both the nodes? The way to take statedump is 'kill -SIGUSR1 $(pidof glusterd)' and the file can be found at /var/run/gluster directory.On Tue, 13 Dec 2016 at 22:11, Cedric Lemarchand <yipikai7@xxxxxxxxx> wrote:1. sorry, 3.9.0-12. no it does nothing3. here they are, from gl1 to gl6 : https://we.tl/EPaMs6geoROn 13 Dec 2016, at 16:49, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:And 3. In case 2 doesn't work, please provide the glusterd log files from gl1 & gl5On Tue, Dec 13, 2016 at 9:16 PM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:1. Could you mention which gluster version are you running with?2. Does restarting glusterd instance on gl1 & gl5 solves the issue (after removing the volume-id xattr from the bricks) ?--On Tue, Dec 13, 2016 at 8:56 PM, Cedric Lemarchand <yipikai7@xxxxxxxxx> wrote:Hello,
When I try to add 3 bricks to a working cluster composed of 3 nodes / 3 bricks in dispersed mode 2+1, it fails like this :
root@gl1:~# gluster volume add-brick vol1 gl4:/data/br1 gl5:/data/br1 gl6:/data/br1
volume add-brick: failed: Pre Validation failed on gl4. Host gl5 not connected
However all peers are connected and there aren't networking issues :
root@gl1:~# gluster peer status
Number of Peers: 5
Hostname: gl2
Uuid: 616f100f-a3f4-46e4-b161-ee5db5a60e26
State: Peer in Cluster (Connected)
Hostname: gl3
Uuid: acb828b8-f4b3-42ab-a9d2-b3e7b917dc9a
State: Peer in Cluster (Connected)
Hostname: gl4
Uuid: 813ad056-5e84-4fdb-ac13-38d24c748bc4
State: Peer in Cluster (Connected)
Hostname: gl5
Uuid: a7933aeb-b08b-4ebb-a797-b8ecbe5a03c6
State: Peer in Cluster (Connected)
Hostname: gl6
Uuid: 63c9a6c1-0adf-4cf5-af7b-b28a60911c99
State: Peer in Cluster (Connected)
:
When I try a second time, the error is different :
root@gl1:~# gluster volume add-brick vol1 gl4:/data/br1 gl5:/data/br1 gl6:/data/br1
volume add-brick: failed: Pre Validation failed on gl5. /data/br1 is already part of a volume
Pre Validation failed on gl6. /data/br1 is already part of a volume
Pre Validation failed on gl4. /data/br1 is already part of a volume
It seems the previous try, even if it has failed, have created the gluster attributes on file system as shown by attr on gl4/5/6 :
Attribute "glusterfs.volume-id" has a 16 byte value for /data/br1
I already purge gluster and reformat brick on gl4/5/6 but the issue persist, any ideas ? did I miss something ?
Some informations :
root@gl1:~# gluster volume info
Volume Name: vol1
Type: Disperse
Volume ID: bb563884-0e2a-4757-9fd5-cb851ba113c3
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: gl1:/data/br1
Brick2: gl2:/data/br1
Brick3: gl3:/data/br1
Options Reconfigured:
features.scrub-freq: hourly
features.scrub: Inactive
features.bitrot: off
cluster.disperse-self-heal-daemon: enable
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on, I have the following error :
root@gl1:~# gluster volume status
Status of volume: vol1
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------ ------------------
Brick gl1:/data/br1 49152 0 Y 23403
Brick gl2:/data/br1 49152 0 Y 14545
Brick gl3:/data/br1 49152 0 Y 11348
Self-heal Daemon on localhost N/A N/A Y 24766
Self-heal Daemon on gl4 N/A N/A Y 1087
Self-heal Daemon on gl5 N/A N/A Y 1080
Self-heal Daemon on gl3 N/A N/A Y 12321
Self-heal Daemon on gl2 N/A N/A Y 15496
Self-heal Daemon on gl6 N/A N/A Y 1091
Task Status of Volume vol1
------------------------------------------------------------ ------------------
There are no active volume tasks
root@gl1:~# gluster volume info
Volume Name: vol1
Type: Disperse
Volume ID: bb563884-0e2a-4757-9fd5-cb851ba113c3
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: gl1:/data/br1
Brick2: gl2:/data/br1
Brick3: gl3:/data/br1
Options Reconfigured:
features.scrub-freq: hourly
features.scrub: Inactive
features.bitrot: off
cluster.disperse-self-heal-daemon: enable
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
root@gl1:~# gluster peer status
Number of Peers: 5
Hostname: gl2
Uuid: 616f100f-a3f4-46e4-b161-ee5db5a60e26
State: Peer in Cluster (Connected)
Hostname: gl3
Uuid: acb828b8-f4b3-42ab-a9d2-b3e7b917dc9a
State: Peer in Cluster (Connected)
Hostname: gl4
Uuid: 813ad056-5e84-4fdb-ac13-38d24c748bc4
State: Peer in Cluster (Connected)
Hostname: gl5
Uuid: a7933aeb-b08b-4ebb-a797-b8ecbe5a03c6
State: Peer in Cluster (Connected)
Hostname: gl6
Uuid: 63c9a6c1-0adf-4cf5-af7b-b28a60911c99
State: Peer in Cluster (Connected)
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users ~ Atin (atinm)
--~ Atin (atinm)--- Atin (atinm)
--
~ Atin (atinm)
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users