Re: gluster connection interrupted during transfer

Richard Neuboeck <hawk@xxxxxxxxxxxxxxxx> · Thu, 30 Aug 2018 13:48:39 +0200

Hi Nithya,

On 08/30/2018 09:45 AM, Nithya Balachandran wrote:
> Hi Richard,
> 
> 
> 
> On 29 August 2018 at 18:11, Richard Neuboeck <hawk@xxxxxxxxxxxxxxxx
> <mailto:hawk@xxxxxxxxxxxxxxxx>> wrote:
> 
>     Hi Gluster Community,
> 
>     I have problems with a glusterfs 'Transport endpoint not connected'
>     connection abort during file transfers that I can replicate (all the
>     time now) but not pinpoint as to why this is happening.
> 
>     The volume is set up in replica 3 mode and accessed with the fuse
>     gluster client. Both client and server are running CentOS and the
>     supplied 3.12.11 version of gluster.
> 
>     The connection abort happens at different times during rsync but
>     occurs every time I try to sync all our files (1.1TB) to the empty
>     volume.
> 
>     Client and server side I don't find errors in the gluster log files.
>     rsync logs the obvious transfer problem. The only log that shows
>     anything related is the server brick log which states that the
>     connection is shutting down:
> 
>     [2018-08-18 22:40:35.502510] I [MSGID: 115036]
>     [server.c:527:server_rpc_notify] 0-home-server: disconnecting
>     connection from
>     brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0
>     [2018-08-18 22:40:35.502620] W
>     [inodelk.c:499:pl_inodelk_log_cleanup] 0-home-server: releasing lock
>     on eaeb0398-fefd-486d-84a7-f13744d1cf10 held by
>     {client=0x7f83ec0b3ce0, pid=110423 lk-owner=d0fd5ffb427f0000}
>     [2018-08-18 22:40:35.502692] W
>     [entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: releasing lock
>     on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
>     {client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}
>     [2018-08-18 22:40:35.502719] W
>     [entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: releasing lock
>     on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
>     {client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}
>     [2018-08-18 22:40:35.505950] I [MSGID: 101055]
>     [client_t.c:443:gf_client_unref] 0-home-server: Shutting down
>     connection brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0 
> 
> 
>     Since I'm running another replica 3 setup for oVirt for a long time
> 
> 
> Is this setup running with the same gluster version and on the same
> nodes or is it a different cluster?

It's a different cluster (sphere-one, sphere-two and sphere-three)
but the same gluster version and basically the same hardware.

Cheers
Richard

> 
>  
> 
>     now which is completely stable I thought I made a mistake setting
>     different options at first. However even when I reset those options
>     I'm able to reproduce the connection problem.
> 
>     The unoptimized volume setup looks like this: 
> 
> 
>     Volume Name: home
>     Type: Replicate
>     Volume ID: c92fa4cc-4a26-41ff-8c70-1dd07f733ac8
>     Status: Started
>     Snapshot Count: 0
>     Number of Bricks: 1 x 3 = 3
>     Transport-type: tcp
>     Bricks:
>     Brick1: sphere-four:/srv/gluster_home/brick
>     Brick2: sphere-five:/srv/gluster_home/brick
>     Brick3: sphere-six:/srv/gluster_home/brick
>     Options Reconfigured:
>     nfs.disable: on
>     transport.address-family: inet
>     cluster.quorum-type: auto
>     cluster.server-quorum-type: server
>     cluster.server-quorum-ratio: 50%
> 
> 
>     The following additional options were used before:
> 
>     performance.cache-size: 5GB
>     client.event-threads: 4
>     server.event-threads: 4
>     cluster.lookup-optimize: on
>     features.cache-invalidation: on
>     performance.stat-prefetch: on
>     performance.cache-invalidation: on
>     network.inode-lru-limit: 50000
>     features.cache-invalidation-timeout: 600
>     performance.md-cache-timeout: 600
>     performance.parallel-readdir: on
> 
> 
>     In this case the gluster servers and also the client is using a
>     bonded network device running in adaptive load balancing mode.
> 
>     I've tried using the debug option for the client mount. But except
>     for a ~0.5TB log file I didn't get information that seems
>     helpful to me.
> 
>     Transferring just a couple of GB works without problems.
> 
>     It may very well be that I'm already blind to the obvious but after
>     many long running tests I can't find the crux in the setup.
> 
>     Does anyone have an idea as how to approach this problem in a way
>     that sheds some useful information?
> 
>     Any help is highly appreciated!
>     Cheers
>     Richard
> 
>     -- 
>     /dev/null
> 
> 
> 
> 
>     _______________________________________________
>     Gluster-users mailing list
>     Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
>     https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users>
> 
> 

-- 
/dev/null

Attachment:
signature.asc

Description: OpenPGP digital signature
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users