Re: update to 4.1.6-1 and fix-layout failing

Nithya Balachandran <nbalacha@xxxxxxxxxx> · Mon, 7 Jan 2019 20:48:31 +0530

On Fri, 4 Jan 2019 at 17:10, mohammad kashif <kashif.alig@xxxxxxxxx> wrote:
Hi Nithya

rebalance logs has only these warnings
2019-01-04 09:59:20.826261] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-atlasglust-client-5: error returned while attempting to connect to host:(null), port:0
[2019-01-04 09:59:20.828113] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-atlasglust-client-6: error returned while attempting to connect to host:(null), port:0
[2019-01-04 09:59:20.832017] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-atlasglust-client-4: error returned while attempting to connect to host:(null), port:0 

Please send me the rebalance logs if possible. Are 08 and 09 the newly added nodes?  Are no directories being created on those ?

gluster volume rebalance atlasglust status
                               Node                                    status           run time in h:m:s
                          ---------                               -----------                ------------
                          localhost                             fix-layout in progress        1:0:59
     pplxgluster02.physics.ox.ac.uk                             fix-layout in progress        1:0:59
     pplxgluster03.physics.ox.ac.uk                             fix-layout in progress        1:0:59
     pplxgluster04.physics.ox.ac.uk                             fix-layout in progress        1:0:59
     pplxgluster05.physics.ox.ac.uk                             fix-layout in progress        1:0:59
     pplxgluster06.physics.ox.ac.uk                             fix-layout in progress        1:0:59
     pplxgluster07.physics.ox.ac.uk                             fix-layout in progress        1:0:59
     pplxgluster08.physics.ox.ac.uk                             fix-layout in progress        1:0:59
     pplxgluster09.physics.ox.ac.uk                             fix-layout in progress        1:0:59

But there is no new entry in logs for last one hour and I can't see any new directories being created.

Thanks

Kashif

On Fri, Jan 4, 2019 at 10:42 AM Nithya Balachandran <nbalacha@xxxxxxxxxx> wrote:

On Fri, 4 Jan 2019 at 15:48, mohammad kashif <kashif.alig@xxxxxxxxx> wrote:
Hi 

I have updated our distributed gluster storage from 3.12.9-1 to 4.1.6-1. The existing cluster had seven servers totalling in around 450 TB. OS is Centos7.  The update went OK and I could access files.
Then I added two more servers of 90TB each to cluster and started fix-layout

gluster volume rebalance atlasglust fix-layout start 

Some directories were created at new servers and then stopped although rebalance status was showing that it is still running. I think it stopped creating new directories after this error

E [MSGID: 106061] [glusterd-utils.c:10697:glusterd_volume_rebalance_use_rsp_dict] 0-glusterd: failed to get index
The message "E [MSGID: 106061] [glusterd-utils.c:10697:glusterd_volume_rebalance_use_rsp_dict] 0-glusterd: failed to get index" repeated 7 times between [2019-01-03 13:16:31.146779] and [2019-01-03 13:16:31.158612]

There are also many warning like this
[2019-01-03 16:04:34.120777] I [MSGID: 106499] [glusterd-handler.c:4314:__glusterd_handle_status_volume] 0-management: Received status volume req for volume atlasglust
[2019-01-03 17:04:28.541805] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-management: error returned while attempting to connect to host:(null), port:0

These are the glusterd logs. Do you see any errors in the rebalance logs for this volume?

I waited for around 12 hours and then stopped fix-layout and started again
I can see the same error again

[2019-01-04 09:59:20.825930] E [MSGID: 106061] [glusterd-utils.c:10697:glusterd_volume_rebalance_use_rsp_dict] 0-glusterd: failed to get index
The message "E [MSGID: 106061] [glusterd-utils.c:10697:glusterd_volume_rebalance_use_rsp_dict] 0-glusterd: failed to get index" repeated 7 times between [2019-01-04 09:59:20.825930] and [2019-01-04 09:59:20.837068] 

Please suggest as it is our production service. 

At the moment, I have stopped clients from using file system. Would it be OK if I allow clients to access file system while fix-layout is still going.

Thanks

Kashif

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

https://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users