Hi Gluster experts,
Good day!
I am checking one issue related to Glusterfs, and found following issue.
Description of issue:
1>our cluster env has three storage node , sn-0(169.254.0.24)and sn-1(169.254.0.23) is deployed as replicate storage nodes. Sn-2 is quorum node.
2>stop sn-0 node(so sn-0 become unreachable suddenly), and execute mound command: time mount -t glusterfs 169.254.0.24:/log /mnt/test2 -o rw,backupvolfile-server=169.254.0.23
3> this command will not finish until 5 minutes pass.
4> wait some time (10 minutes) , execute this mount command again, the same command will finish within several seconds.
My analysis:
On the second time the mount command could return quickly, while the first time mount hang 5 minutes, I compare the glusterfs log file, and find
1>on the first time the rpc client reconnect keeps trying until following prints:
[2016-11-17 02:50:23.593340] E [socket.c:2282:socket_connect_finish] 0-log-client-0: connection to 169.254.0.24:24007 failed (Connection timed out)
[2016-11-17 02:50:23.593360] T [socket.c:738:__socket_disconnect] 0-log-client-0: disconnecting 0xe2a280, state=0 gen=0 sock=12
[2016-11-17 02:50:23.593370] D [socket.c:719:__socket_shutdown] 0-log-client-0: shutdown() returned -1. Transport endpoint is not connected
[2016-11-17 02:50:23.593378] D [socket.c:2359:socket_event_handler] 0-transport: disconnecting now
but the begin time of this retry is:
[2016-11-17 02:48:20.416900] T [rpc-clnt.c:418:rpc_clnt_reconnect] 0-log-client-0: attempting reconnect
Glusterfs process try a lot of times and the time elapse is around 2 minutes
2>On the second time the rpc client also keeps reconnecting but each time it returns very quickly with following prints:
[2016-11-17 04:27:06.045700] E [socket.c:2282:socket_connect_finish] 0-log-client-0: connection to 169.254.0.24:24007 failed (No route to host)
[2016-11-17 04:27:06.045751] T [socket.c:738:__socket_disconnect] 0-log-client-0: disconnecting 0xffc4a0, state=0 gen=0 sock=12
[2016-11-17 04:27:06.045760] D [socket.c:719:__socket_shutdown] 0-log-client-0: shutdown() returned -1. Transport endpoint is not connected
[2016-11-17 04:27:06.045766] D [socket.c:2359:socket_event_handler] 0-transport: disconnecting now
Questions:
Why socket_connect_finish is called much later for the first time mount? I assume it is because in the first mount connect() timeout is slowly and the second
time connect() return much faster?
Is there any way to control the rpc_clnt_reconnect times? Because seems this is a waste, no need to retry so many times.
Attachment: two times mount trace log of glusterfs.
Best regards,
Cynthia (周琳)
Cynthia (周琳)
MBB SM HETRAN SW3 MATRIX
Storage
Mobile: +86 (0)18657188311
Mobile: +86 (0)18657188311
<<attachment: when_nova_stop_sn-0_mount_slow_on_sn1_and_sn2_issue.zip>>
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel