Nithya,
Steve
[2017-10-17 02:22:13.453575] I [MSGID: 115029] [server-handshake.c:692:server_setvolume] 0-video-server: accepted client from node-dc4-03-5825-2017/08/30-20:45:55:170091-video-client-4-2-318 (version: 3.8.15) [2017-10-17 02:22:31.353286] I [MSGID: 115036] [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection from node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-403 [2017-10-17 02:22:31.353326] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-403 [2017-10-17 02:22:42.288856] I [MSGID: 115029] [server-handshake.c:692:server_setvolume] 0-video-server: accepted client from node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-404 (version: 3.8.13) [2017-10-17 02:29:04.889303] I [MSGID: 115036] [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection from node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-404 [2017-10-17 02:29:04.889347] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-404 [2017-10-17 02:29:15.327604] I [MSGID: 115029] [server-handshake.c:692:server_setvolume] 0-video-server: accepted client from node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-405 (version: 3.8.13) [2017-10-17 02:33:30.745314] I [MSGID: 115036] [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-594 [2017-10-17 02:33:30.745360] I [MSGID: 115013] [server-helpers.c:293:do_fd_cleanup] 0-video-server: fd cleanup on /xx [2017-10-17 02:33:30.745396] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-594 [2017-10-17 02:33:41.563748] I [MSGID: 115029] [server-handshake.c:692:server_setvolume] 0-video-server: accepted client from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-595 (version: 3.8.13) [2017-10-17 02:36:43.833304] I [MSGID: 115036] [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-595 [2017-10-17 02:36:43.833342] I [MSGID: 115013] [server-helpers.c:293:do_fd_cleanup] 0-video-server: fd cleanup on /xx [2017-10-17 02:36:43.833371] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-595 [2017-10-17 02:36:54.569836] I [MSGID: 115029] [server-handshake.c:692:server_setvolume] 0-video-server: accepted client from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-596 (version: 3.8.13) [2017-10-17 02:38:16.697306] I [MSGID: 115036] [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-596 [2017-10-17 02:38:16.697370] I [MSGID: 115013] [server-helpers.c:293:do_fd_cleanup] 0-video-server: fd cleanup on /xx [2017-10-17 02:38:16.697432] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-596 [2017-10-17 02:38:34.591506] I [MSGID: 115029] [server-handshake.c:692:server_setvolume] 0-video-server: accepted client from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-597 (version: 3.8.13) [2017-10-17 02:55:56.473306] I [MSGID: 115036] [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection from titan-17527-2017/09/18-19:57:41:611709-video-client-4-0-19 [2017-10-17 02:55:56.473366] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection titan-17527-2017/09/18-19:57:41:611709-video-client-4-0-19 [2017-10-17 02:56:07.161790] I [MSGID: 115029] [server-handshake.c:692:server_setvolume] 0-video-server: accepted client from titan-17527-2017/09/18-19:57:41:611709-video-client-4-0-20 (version: 3.8.8) [2017-10-17 03:15:13.529281] I [MSGID: 115036] [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-597 [2017-10-17 03:15:13.529330] I [MSGID: 115013] [server-helpers.c:293:do_fd_cleanup] 0-video-server: fd cleanup on /xx [2017-10-17 03:15:13.529400] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-597 [2017-10-17 03:15:41.764247] I [MSGID: 115029] [server-handshake.c:692:server_setvolume] 0-video-server: accepted client from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-598 (version: 3.8.13) [2017-10-17 03:20:28.921396] I [MSGID: 115036] [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection from node-dc3-02-15013-2017/10/14-18:04:51:499320-video-client-4-0-0 [2017-10-17 03:20:28.921498] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection node-dc3-02-15013-2017/10/14-18:04:51:499320-video-client-4-0-0 [2017-10-17 03:20:39.348678] I [login.c:76:gf_auth] 0-auth/login: allowed user names: be603ada-6523-44d3-a900-zzzzzzzzzzzz [2017-10-17 03:20:39.348909] I [MSGID: 115029] [server-handshake.c:692:server_setvolume] 0-video-server: accepted client from node-dc3-02-15013-2017/10/14-18:04:51:499320-video-client-4-0-1 (version: 3.8.7) [2017-10-17 03:27:18.385374] I [MSGID: 115036] [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection from node-dc3-02-15013-2017/10/14-18:04:51:499320-video-client-4-0-1 [2017-10-17 03:27:18.385423] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection node-dc3-02-15013-2017/10/14-18:04:51:499320-video-client-4-0-1 [2017-10-17 03:31:47.325285] I [MSGID: 115036] [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-598 [2017-10-17 03:31:47.325340] I [MSGID: 115013] [server-helpers.c:293:do_fd_cleanup] 0-video-server: fd cleanup on /xx [2017-10-17 03:31:47.325384] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-598 [2017-10-17 03:32:00.855905] I [MSGID: 115029] [server-handshake.c:692:server_setvolume] 0-video-server: accepted client from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-599 (version: 3.8.13) [2017-10-17 03:33:23.001337] I [MSGID: 115036] [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-599 [2017-10-17 03:33:23.001400] I [MSGID: 115013] [server-helpers.c:293:do_fd_cleanup] 0-video-server: fd cleanup on /xx [2017-10-17 03:33:23.001450] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-599 [2017-10-17 03:33:33.860452] I [MSGID: 115029] [server-handshake.c:692:server_setvolume] 0-video-server: accepted client from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-600 (version: 3.8.13) [2017-10-17 03:54:05.433317] I [MSGID: 115036] [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection from node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-405 [2017-10-17 03:54:05.433353] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-405 [2017-10-17 03:54:15.739343] I [MSGID: 115029] [server-handshake.c:692:server_setvolume] 0-video-server: accepted client from node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-406 (version: 3.8.13)
On 17 October 2017 at 10:26, Nithya Balachandran <nbalacha@xxxxxxxxxx> wrote:
On 17 October 2017 at 14:48, Stephen Remde <stephen.remde@xxxxxxxxxxx> wrote:Hi,I have a rebalance that has failed on one peer twice now. Rebalance logs below (directories anonomised and some irrelevant log lines cut). It looks like it loses connection to the brick, but immediately stops the rebalance on that peer instead of waiting for reconnection - which happens a second or so later. Is this normal behaviour? So far it has been the same server and the same (remote) brick.The brick shows a high number of disconnects compared to the other bricks on the same server./export-md0-brick.log.1 2 ./export-md1-brick.log.1 2 ./export-md2-brick.log.1 181 ./export-md3-brick.log.1 2Any clues? What could be causing this because there is nothing in the log to indicate cause.The rebalance process requires that all DHT child subvols be up during the operation as it needs to reapply the directory layouts (which requires all child subvols to be up). As this is a pure distribute volume, even a single brick getting disconnected is enough to cause the process to stop.You would need to figure out why that brick is disconnecting so often. The brick logs might help with that.Regards,Nithya______________________________Steve gluster volume info video Volume Name: video Type: Distribute Volume ID: ccdac37f-9b0e-415f-b62e-9071d8168199 Status: Started Snapshot Count: 0 Number of Bricks: 9 Transport-type: tcp Bricks: Brick1: 10.0.0.31:/export/md0/brick Brick2: 10.0.0.32:/export/md0/brick Brick3: 10.0.0.31:/export/md1/brick Brick4: 10.0.0.32:/export/md1/brick Brick5: 10.0.0.31:/export/md2/brick Brick6: 10.0.0.32:/export/md2/brick Brick7: 10.0.0.31:/export/md3/brick Brick8: 10.0.0.32:/export/md3/brick Brick9: 10.0.0.33:/export/md0/brick Options Reconfigured: network.ping-timeout: 10 cluster.min-free-disk: 1% transport.address-family: inet performance.readdir-ahead: on nfs.disable: on cluster.rebal-throttle: lazy [2017-10-12 23:00:55.099153] W [socket.c:590:__socket_rwv] 0-video-client-4: readv on 10.0.0.31:49164 failed (Connection reset by peer) [2017-10-12 23:00:55.099709] I [MSGID: 114018] [client.c:2280:client_rpc_noti fy] 0-video-client-4: disconnected from video-client-4. Client process will keep trying to connect to glusterd until brick's port is available [2017-10-12 23:00:55.099741] W [MSGID: 109073] [dht-common.c:8839:dht_notify] 0-video-dht: Received CHILD_DOWN. Exiting [2017-10-12 23:00:55.099752] I [MSGID: 109029] [dht-rebalance.c:4195:gf_defra g_stop] 0-: Received stop command on rebalance [2017-10-12 23:01:05.478462] I [rpc-clnt.c:1947:rpc_clnt_reco nfig] 0-video-client-4: changing port to 49164 (from 0) [2017-10-12 23:01:05.481180] I [MSGID: 114057] [client-handshake.c:1446:selec t_server_supported_programs] 0-video-client-4: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-10-12 23:01:05.482630] I [MSGID: 114046] [client-handshake.c:1222:clien t_setvolume_cbk] 0-video-client-4: Connected to video-client-4, attached to remote volume '/export/md2/brick'. [2017-10-12 23:01:05.482659] I [MSGID: 114047] [client-handshake.c:1233:clien t_setvolume_cbk] 0-video-client-4: Server and Client lk-version numbers are not same, reopening the fds [2017-10-12 23:01:05.483365] I [MSGID: 114035] [client-handshake.c:201:client _set_lk_version_cbk] 0-video-client-4: Server lk version = 1 [2017-10-12 23:01:30.310089] I [dht-rebalance.c:2819:gf_defra g_process_dir] 0-DHT: Found critical error from gf_defrag_get_entry [2017-10-12 23:01:30.310166] E [MSGID: 109111] [dht-rebalance.c:3090:gf_defra g_fix_layout] 0-video-dht: gf_defrag_process_dir failed for directory: /y/y/y/y/y [2017-10-12 23:01:30.380574] E [MSGID: 109016] [dht-rebalance.c:3267:gf_defra g_fix_layout] 0-video-dht: Fix layout failed for /y/y/y/y/y [2017-10-12 23:01:30.380756] E [MSGID: 109016] [dht-rebalance.c:3267:gf_defra g_fix_layout] 0-video-dht: Fix layout failed for /y/y/y/y [2017-10-12 23:01:30.380879] E [MSGID: 109016] [dht-rebalance.c:3267:gf_defra g_fix_layout] 0-video-dht: Fix layout failed for /y/y/y [2017-10-12 23:01:30.380965] E [MSGID: 109016] [dht-rebalance.c:3267:gf_defra g_fix_layout] 0-video-dht: Fix layout failed for /y/y [2017-10-12 23:03:09.285157] W [glusterfsd.c:1327:cleanup_and _exit] (-->/lib/x86_64-linux-gnu/libp thread.so.0(+0x76ba) [0x7f112b6d16ba] -->/usr/sbin/glusterfs(gluster fs_sigwaiter+0xe5) [0x55b325019545] -->/usr/sbin/glusterfs(cleanup _and_exit+0x54) [0x55b3250193b4] ) 0-: received signum (15), shutting down [2017-10-17 03:20:28.921512] W [socket.c:590:__socket_rwv] 0-video-client-4: readv on 10.0.0.31:49164 failed (Connection reset by peer) [2017-10-17 03:20:28.921554] I [MSGID: 114018] [client.c:2280:client_rpc_noti fy] 0-video-client-4: disconnected from video-client-4. Client process will keep trying to connect to glusterd until brick's port is available [2017-10-17 03:20:28.921570] W [MSGID: 109073] [dht-common.c:8839:dht_notify] 0-video-dht: Received CHILD_DOWN. Exiting [2017-10-17 03:20:28.921578] I [MSGID: 109029] [dht-rebalance.c:4195:gf_defra g_stop] 0-: Received stop command on rebalance [2017-10-17 03:20:39.344417] I [rpc-clnt.c:1947:rpc_clnt_reco nfig] 0-video-client-4: changing port to 49164 (from 0) [2017-10-17 03:20:39.347440] I [MSGID: 114057] [client-handshake.c:1446:selec t_server_supported_programs] 0-video-client-4: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-10-17 03:20:39.349244] I [MSGID: 114046] [client-handshake.c:1222:clien t_setvolume_cbk] 0-video-client-4: Connected to video-client-4, attached to remote volume '/export/md2/brick'. [2017-10-17 03:20:39.349261] I [MSGID: 114047] [client-handshake.c:1233:clien t_setvolume_cbk] 0-video-client-4: Server and Client lk-version numbers are not same, reopening the fds [2017-10-17 03:20:39.350611] I [MSGID: 114035] [client-handshake.c:201:client _set_lk_version_cbk] 0-video-client-4: Server lk version = 1 [2017-10-17 03:27:17.231133] I [dht-rebalance.c:2819:gf_defra g_process_dir] 0-DHT: Found critical error from gf_defrag_get_entry [2017-10-17 03:27:17.231214] E [MSGID: 109111] [dht-rebalance.c:3090:gf_defra g_fix_layout] 0-video-dht: gf_defrag_process_dir failed for directory: /x/x/x/x/x [2017-10-17 03:27:17.562481] E [MSGID: 109016] [dht-rebalance.c:3267:gf_defra g_fix_layout] 0-video-dht: Fix layout failed for /x/x/x/x/x [2017-10-17 03:27:17.562619] E [MSGID: 109016] [dht-rebalance.c:3267:gf_defra g_fix_layout] 0-video-dht: Fix layout failed for /x/x/x/x [2017-10-17 03:27:17.562726] E [MSGID: 109016] [dht-rebalance.c:3267:gf_defra g_fix_layout] 0-video-dht: Fix layout failed for /x/x/x [2017-10-17 03:27:17.562810] E [MSGID: 109016] [dht-rebalance.c:3267:gf_defra g_fix_layout] 0-video-dht: Fix layout failed for /x/x [2017-10-17 03:27:18.379825] W [glusterfsd.c:1327:cleanup_and _exit] (-->/lib/x86_64-linux-gnu/libp thread.so.0(+0x76ba) [0x7f700b9696ba] -->/usr/sbin/glusterfs(gluster fs_sigwaiter+0xe5) [0x55f9c0022545] -->/usr/sbin/glusterfs(cleanup _and_exit+0x54) [0x55f9c00223b4] ) 0-: received signum (15), shutting down _________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users
Dr Stephen Remde
Director, Innovation and Research
T: 01535 280066
M: 07764 740920
E: stephen.remde@xxxxxxxxxxx
W: www.gaist.co.uk
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users