Re: Pre Validation failed when adding bricks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Atin,

Nice catch, as you said there was a mistake in the host file of gl4 where gl5 and gl6 were missing, now it works fine.

Thanks,

Cédric


On 14 Dec 2016, at 05:10, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:



On Wed, Dec 14, 2016 at 9:36 AM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:
From gl4.dump file:

glusterd.peer4.hostname=gl5                                                       
glusterd.peer4.port=0                                                             
glusterd.peer4.state=3                                                            
glusterd.peer4.quorum-action="" class="">0                                                    
glusterd.peer4.quorum-contrib=2                                                   
glusterd.peer4.detaching=0                                                        
glusterd.peer4.locked=0                                                           
glusterd.peer4.rpc.peername=                                                      
glusterd.peer4.rpc.connected=0       <===== this indicates the gl5 is not connected with gl4, so add-brick command failed as it supposed to in this case                                               
glusterd.peer4.rpc.total-bytes-read=0                                             
glusterd.peer4.rpc.total-bytes-written=0                                          
glusterd.peer4.rpc.ping_msgs_sent=0                                               
glusterd.peer4.rpc.msgs_sent=0

And the same goes true for gl6 as well as per this dump. So the issue is with gl4 node.

Now, in gl4's glusterd log I see the repetitive entries of following logs:

[2016-12-13 16:35:31.438462] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl5
[2016-12-13 16:35:33.440155] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl6
[2016-12-13 16:35:34.441639] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl5
[2016-12-13 16:35:36.454546] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl6
[2016-12-13 16:35:37.456062] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gl5

The above indicates that gl4 is not able to resolve the DNS name for gl5 & gl6 where as in gl5 & gl6 it could resolve for gl4. Please check your DNS configuration and see if there are any incorrect entries put up there. From our side what we need to check is why peer status didn't show both gl5 & gl6 as disconnected.

Can you run gluster peer status from gl4 and see if both gl5 & gl6 are mentioned as disconnected, if so then its expected, since gl5 & gl6 were connected for all the nodes apart from gl4 peer status on all the other nodes apart from gl4 would show up as connected and that's an expected behaviour. Please do confirm.
 


On Wed, Dec 14, 2016 at 12:44 AM, Cedric Lemarchand <yipikai7@xxxxxxxxx> wrote:
Thanks Atin, the files you asked : https://we.tl/XrOvFhffGq

On 13 Dec 2016, at 19:08, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:

Thanks, we will get back on this. In the mean time can you please also share glusterd statedump file from both the nodes? The way to take statedump is 'kill -SIGUSR1 $(pidof glusterd)' and the file can be found at /var/run/gluster directory.

On Tue, 13 Dec 2016 at 22:11, Cedric Lemarchand <yipikai7@xxxxxxxxx> wrote:
1. sorry, 3.9.0-1
2. no it does nothing
3. here they are, from gl1 to gl6 : https://we.tl/EPaMs6geoR



On 13 Dec 2016, at 16:49, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:

And 3. In case 2 doesn't work, please provide the glusterd log files from gl1 & gl5

On Tue, Dec 13, 2016 at 9:16 PM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:
1. Could you mention which gluster version are you running with?
2. Does restarting glusterd instance on gl1 & gl5 solves the issue (after removing the volume-id xattr from the bricks) ?

On Tue, Dec 13, 2016 at 8:56 PM, Cedric Lemarchand <yipikai7@xxxxxxxxx> wrote:
Hello,





When I try to add 3 bricks to a working cluster composed of 3 nodes / 3 bricks in dispersed mode 2+1, it fails like this :





root@gl1:~# gluster volume add-brick vol1 gl4:/data/br1 gl5:/data/br1 gl6:/data/br1


volume add-brick: failed: Pre Validation failed on gl4. Host gl5 not connected





However all peers are connected and there aren't networking issues :





root@gl1:~# gluster peer status


Number of Peers: 5





Hostname: gl2


Uuid: 616f100f-a3f4-46e4-b161-ee5db5a60e26


State: Peer in Cluster (Connected)





Hostname: gl3


Uuid: acb828b8-f4b3-42ab-a9d2-b3e7b917dc9a


State: Peer in Cluster (Connected)





Hostname: gl4


Uuid: 813ad056-5e84-4fdb-ac13-38d24c748bc4


State: Peer in Cluster (Connected)





Hostname: gl5


Uuid: a7933aeb-b08b-4ebb-a797-b8ecbe5a03c6


State: Peer in Cluster (Connected)





Hostname: gl6


Uuid: 63c9a6c1-0adf-4cf5-af7b-b28a60911c99


State: Peer in Cluster (Connected)


 :





When I try a second time, the error is different :





root@gl1:~# gluster volume add-brick vol1 gl4:/data/br1 gl5:/data/br1 gl6:/data/br1


volume add-brick: failed: Pre Validation failed on gl5. /data/br1 is already part of a volume


Pre Validation failed on gl6. /data/br1 is already part of a volume


Pre Validation failed on gl4. /data/br1 is already part of a volume





It seems the previous try, even if it has failed, have created the gluster attributes on file system as shown by attr on gl4/5/6 :





Attribute "glusterfs.volume-id" has a 16 byte value for /data/br1





I already purge gluster and reformat brick on gl4/5/6 but the issue persist, any ideas ? did I miss something ?








Some informations :





root@gl1:~# gluster volume info





Volume Name: vol1


Type: Disperse


Volume ID: bb563884-0e2a-4757-9fd5-cb851ba113c3


Status: Started


Snapshot Count: 0


Number of Bricks: 1 x (2 + 1) = 3


Transport-type: tcp


Bricks:


Brick1: gl1:/data/br1


Brick2: gl2:/data/br1


Brick3: gl3:/data/br1


Options Reconfigured:


features.scrub-freq: hourly


features.scrub: Inactive


features.bitrot: off


cluster.disperse-self-heal-daemon: enable


transport.address-family: inet


performance.readdir-ahead: on


nfs.disable: on, I have the following error :





root@gl1:~# gluster volume status


Status of volume: vol1


Gluster process                             TCP Port  RDMA Port  Online  Pid


------------------------------------------------------------------------------


Brick gl1:/data/br1                         49152     0          Y       23403


Brick gl2:/data/br1                         49152     0          Y       14545


Brick gl3:/data/br1                         49152     0          Y       11348


Self-heal Daemon on localhost               N/A       N/A        Y       24766


Self-heal Daemon on gl4                     N/A       N/A        Y       1087


Self-heal Daemon on gl5                     N/A       N/A        Y       1080


Self-heal Daemon on gl3                     N/A       N/A        Y       12321


Self-heal Daemon on gl2                     N/A       N/A        Y       15496


Self-heal Daemon on gl6                     N/A       N/A        Y       1091





Task Status of Volume vol1


------------------------------------------------------------------------------


There are no active volume tasks





root@gl1:~# gluster volume info





Volume Name: vol1


Type: Disperse


Volume ID: bb563884-0e2a-4757-9fd5-cb851ba113c3


Status: Started


Snapshot Count: 0


Number of Bricks: 1 x (2 + 1) = 3


Transport-type: tcp


Bricks:


Brick1: gl1:/data/br1


Brick2: gl2:/data/br1


Brick3: gl3:/data/br1


Options Reconfigured:


features.scrub-freq: hourly


features.scrub: Inactive


features.bitrot: off


cluster.disperse-self-heal-daemon: enable


transport.address-family: inet


performance.readdir-ahead: on


nfs.disable: on





root@gl1:~# gluster peer status


Number of Peers: 5





Hostname: gl2


Uuid: 616f100f-a3f4-46e4-b161-ee5db5a60e26


State: Peer in Cluster (Connected)





Hostname: gl3


Uuid: acb828b8-f4b3-42ab-a9d2-b3e7b917dc9a


State: Peer in Cluster (Connected)





Hostname: gl4


Uuid: 813ad056-5e84-4fdb-ac13-38d24c748bc4


State: Peer in Cluster (Connected)





Hostname: gl5


Uuid: a7933aeb-b08b-4ebb-a797-b8ecbe5a03c6


State: Peer in Cluster (Connected)





Hostname: gl6


Uuid: 63c9a6c1-0adf-4cf5-af7b-b28a60911c99


State: Peer in Cluster (Connected)











_______________________________________________


Gluster-users mailing list


Gluster-users@xxxxxxxxxxx


http://www.gluster.org/mailman/listinfo/gluster-users





--

~ Atin (atinm)







--

~ Atin (atinm)





--
- Atin (atinm)




--

~ Atin (atinm)



--

~ Atin (atinm)

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux