Re: Pre Validation failed when adding bricks

Atin Mukherjee <amukherj@xxxxxxxxxx> · Wed, 14 Dec 2016 09:36:00 +0530

From gl4.dump file:

glusterd.peer4.hostname=gl5                                                        
glusterd.peer4.port=0                                                              
glusterd.peer4.state=3                                                             
glusterd.peer4.quorum-action="" 
glusterd.peer4.quorum-contrib=2                                                    
glusterd.peer4.detaching=0                                                         
glusterd.peer4.locked=0                                                            
glusterd.peer4.rpc.peername=                                                       
glusterd.peer4.rpc.connected=0       <===== this indicates the gl5 is not connected with gl4, so add-brick command failed as it supposed to in this case                                                
glusterd.peer4.rpc.total-bytes-read=0                                              
glusterd.peer4.rpc.total-bytes-written=0                                           
glusterd.peer4.rpc.ping_msgs_sent=0                                                
glusterd.peer4.rpc.msgs_sent=0

And the same goes true for gl6 as well as per this dump. So the issue is with gl4 node.

Now, in gl4's glusterd log I see the repetitive entries of following logs:

[2016-12-13 16:35:31.438462] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl5
[2016-12-13 16:35:33.440155] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl6
[2016-12-13 16:35:34.441639] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl5
[2016-12-13 16:35:36.454546] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl6
[2016-12-13 16:35:37.456062] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl5 

The above indicates that gl4 is not able to resolve the DNS name for gl5 & gl6 where as in gl5 & gl6 it could resolve for gl4. Please check your DNS configuration and see if there are any incorrect entries put up there. From our side what we need to check is why peer status didn't show both gl5 & gl6 as disconnected.

On Wed, Dec 14, 2016 at 12:44 AM, Cedric Lemarchand <yipikai7@xxxxxxxxx> wrote:
Thanks Atin, the files you asked : https://we.tl/XrOvFhffGq

On 13 Dec 2016, at 19:08, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:

Thanks, we will get back on this. In the mean time can you please also share glusterd statedump file from both the nodes? The way to take statedump is 'kill -SIGUSR1 $(pidof glusterd)' and the file can be found at /var/run/gluster directory.

On Tue, 13 Dec 2016 at 22:11, Cedric Lemarchand <yipikai7@xxxxxxxxx> wrote:
1. sorry, 3.9.0-12. no it does nothing
3. here they are, from gl1 to gl6 : https://we.tl/EPaMs6geoR

On 13 Dec 2016, at 16:49, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:

And 3. In case 2 doesn't work, please provide the glusterd log files from gl1 & gl5

On Tue, Dec 13, 2016 at 9:16 PM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:
1. Could you mention which gluster version are you running with?
2. Does restarting glusterd instance on gl1 & gl5 solves the issue (after removing the volume-id xattr from the bricks) ?

On Tue, Dec 13, 2016 at 8:56 PM, Cedric Lemarchand <yipikai7@xxxxxxxxx> wrote:
Hello,

When I try to add 3 bricks to a working cluster composed of 3 nodes / 3 bricks in dispersed mode 2+1, it fails like this :

root@gl1:~# gluster volume add-brick vol1 gl4:/data/br1 gl5:/data/br1 gl6:/data/br1

volume add-brick: failed: Pre Validation failed on gl4. Host gl5 not connected

However all peers are connected and there aren't networking issues :

root@gl1:~# gluster peer status

Number of Peers: 5

Hostname: gl2

Uuid: 616f100f-a3f4-46e4-b161-ee5db5a60e26

State: Peer in Cluster (Connected)

Hostname: gl3

Uuid: acb828b8-f4b3-42ab-a9d2-b3e7b917dc9a

State: Peer in Cluster (Connected)

Hostname: gl4

Uuid: 813ad056-5e84-4fdb-ac13-38d24c748bc4

State: Peer in Cluster (Connected)

Hostname: gl5

Uuid: a7933aeb-b08b-4ebb-a797-b8ecbe5a03c6

State: Peer in Cluster (Connected)

Hostname: gl6

Uuid: 63c9a6c1-0adf-4cf5-af7b-b28a60911c99

State: Peer in Cluster (Connected)

 :

When I try a second time, the error is different :

root@gl1:~# gluster volume add-brick vol1 gl4:/data/br1 gl5:/data/br1 gl6:/data/br1

volume add-brick: failed: Pre Validation failed on gl5. /data/br1 is already part of a volume

Pre Validation failed on gl6. /data/br1 is already part of a volume

Pre Validation failed on gl4. /data/br1 is already part of a volume

It seems the previous try, even if it has failed, have created the gluster attributes on file system as shown by attr on gl4/5/6 :

Attribute "glusterfs.volume-id" has a 16 byte value for /data/br1

I already purge gluster and reformat brick on gl4/5/6 but the issue persist, any ideas ? did I miss something ?

Some informations :

root@gl1:~# gluster volume info

Volume Name: vol1

Type: Disperse

Volume ID: bb563884-0e2a-4757-9fd5-cb851ba113c3

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x (2 + 1) = 3

Transport-type: tcp

Bricks:

Brick1: gl1:/data/br1

Brick2: gl2:/data/br1

Brick3: gl3:/data/br1

Options Reconfigured:

features.scrub-freq: hourly

features.scrub: Inactive

features.bitrot: off

cluster.disperse-self-heal-daemon: enable

transport.address-family: inet

performance.readdir-ahead: on

nfs.disable: on, I have the following error :

root@gl1:~# gluster volume status

Status of volume: vol1

Gluster process                             TCP Port  RDMA Port  Online  Pid

------------------------------------------------------------------------------

Brick gl1:/data/br1                         49152     0          Y       23403

Brick gl2:/data/br1                         49152     0          Y       14545

Brick gl3:/data/br1                         49152     0          Y       11348

Self-heal Daemon on localhost               N/A       N/A        Y       24766

Self-heal Daemon on gl4                     N/A       N/A        Y       1087

Self-heal Daemon on gl5                     N/A       N/A        Y       1080

Self-heal Daemon on gl3                     N/A       N/A        Y       12321

Self-heal Daemon on gl2                     N/A       N/A        Y       15496

Self-heal Daemon on gl6                     N/A       N/A        Y       1091

Task Status of Volume vol1

------------------------------------------------------------------------------

There are no active volume tasks

root@gl1:~# gluster volume info

Volume Name: vol1

Type: Disperse

Volume ID: bb563884-0e2a-4757-9fd5-cb851ba113c3

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x (2 + 1) = 3

Transport-type: tcp

Bricks:

Brick1: gl1:/data/br1

Brick2: gl2:/data/br1

Brick3: gl3:/data/br1

Options Reconfigured:

features.scrub-freq: hourly

features.scrub: Inactive

features.bitrot: off

cluster.disperse-self-heal-daemon: enable

transport.address-family: inet

performance.readdir-ahead: on

nfs.disable: on

root@gl1:~# gluster peer status

Number of Peers: 5

Hostname: gl2

Uuid: 616f100f-a3f4-46e4-b161-ee5db5a60e26

State: Peer in Cluster (Connected)

Hostname: gl3

Uuid: acb828b8-f4b3-42ab-a9d2-b3e7b917dc9a

State: Peer in Cluster (Connected)

Hostname: gl4

Uuid: 813ad056-5e84-4fdb-ac13-38d24c748bc4

State: Peer in Cluster (Connected)

Hostname: gl5

Uuid: a7933aeb-b08b-4ebb-a797-b8ecbe5a03c6

State: Peer in Cluster (Connected)

Hostname: gl6

Uuid: 63c9a6c1-0adf-4cf5-af7b-b28a60911c99

State: Peer in Cluster (Connected)

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-users

-- 

~ Atin (atinm)

-- 

~ Atin (atinm)

-- 
- Atin (atinm)

-- 

~ Atin (atinm)

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users