Hi Gluster Community, I have problems with a glusterfs 'Transport endpoint not connected' connection abort during file transfers that I can replicate (all the time now) but not pinpoint as to why this is happening. The volume is set up in replica 3 mode and accessed with the fuse gluster client. Both client and server are running CentOS and the supplied 3.12.11 version of gluster. The connection abort happens at different times during rsync but occurs every time I try to sync all our files (1.1TB) to the empty volume. Client and server side I don't find errors in the gluster log files. rsync logs the obvious transfer problem. The only log that shows anything related is the server brick log which states that the connection is shutting down: [2018-08-18 22:40:35.502510] I [MSGID: 115036] [server.c:527:server_rpc_notify] 0-home-server: disconnecting connection from brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0 [2018-08-18 22:40:35.502620] W [inodelk.c:499:pl_inodelk_log_cleanup] 0-home-server: releasing lock on eaeb0398-fefd-486d-84a7-f13744d1cf10 held by {client=0x7f83ec0b3ce0, pid=110423 lk-owner=d0fd5ffb427f0000} [2018-08-18 22:40:35.502692] W [entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: releasing lock on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by {client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000} [2018-08-18 22:40:35.502719] W [entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: releasing lock on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by {client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000} [2018-08-18 22:40:35.505950] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-home-server: Shutting down connection brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0 Since I'm running another replica 3 setup for oVirt for a long time now which is completely stable I thought I made a mistake setting different options at first. However even when I reset those options I'm able to reproduce the connection problem. The unoptimized volume setup looks like this: Volume Name: home Type: Replicate Volume ID: c92fa4cc-4a26-41ff-8c70-1dd07f733ac8 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: sphere-four:/srv/gluster_home/brick Brick2: sphere-five:/srv/gluster_home/brick Brick3: sphere-six:/srv/gluster_home/brick Options Reconfigured: nfs.disable: on transport.address-family: inet cluster.quorum-type: auto cluster.server-quorum-type: server cluster.server-quorum-ratio: 50% The following additional options were used before: performance.cache-size: 5GB client.event-threads: 4 server.event-threads: 4 cluster.lookup-optimize: on features.cache-invalidation: on performance.stat-prefetch: on performance.cache-invalidation: on network.inode-lru-limit: 50000 features.cache-invalidation-timeout: 600 performance.md-cache-timeout: 600 performance.parallel-readdir: on In this case the gluster servers and also the client is using a bonded network device running in adaptive load balancing mode. I've tried using the debug option for the client mount. But except for a ~0.5TB log file I didn't get information that seems helpful to me. Transferring just a couple of GB works without problems. It may very well be that I'm already blind to the obvious but after many long running tests I can't find the crux in the setup. Does anyone have an idea as how to approach this problem in a way that sheds some useful information? Any help is highly appreciated! Cheers Richard -- /dev/null
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users