Random errors with "Transport endpoint is not connected"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,
first of all, glusterfs is a nice idea, liked it a lot, but it need to be a rock-solid product so I made some tests , hope that this bug-report will help you.

Configuration:
3 identical servers and a client, connected with TCP/IP
Servers: Mandrake 9.x (compiled, installed without problem) , client CentOS-5 x86_64, fuse 2.6.5 compiled, installed from source, everything went OK.

Those 3 servers provide 3 simple bricks joined at the client in full mirror ( afr x 3) configuration + read-ahead & writebehind translators. I made a PostgreSQL tablespace (zone) on the mounted /mnt/gfs and copied there a 50 Mb table then stress-it with various operations.
10 read and full updates on every row in the table succeeded.

After a while, a simple "vacuum full analyze" gave the error:
glu=# vacuum full analyze;
ERROR: could not read block 43155 of relation 527933664/527933665/527933666: Transport endpoint is not connected

I repeated the tests many times, after 2,3 minutes of operation, I got the same error, in another place but mostly in WRITE operations. I deleted the whole mounted client disk and rebuild it with STRIPE option instead of AFR translator. The behaviour is the same ... after a couple of succeded operations, I got a failure.

That were the facts, now ... the logs and configuration files.

The client debug log shows this errors:
[1:48:22] [CRITICAL/client-protocol.c:218/call_bail()] client/protocol:bailing transport [1:48:22] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write to break on blocked socket (if any) [1:48:22] [CRITICAL/client-protocol.c:218/call_bail()] client/protocol:bailing transport [1:48:22] [ERROR/common-utils.c:110/full_rwv()] libglusterfs:full_rwv: 6689 bytes r/w instead of 8539 (Broken pipe) [1:48:22] [ERROR/client-protocol.c:204/client_protocol_xfer()] protocol/client:transport_submit failed [1:48:22] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write to break on blocked socket (if any) [1:48:22] [CRITICAL/client-protocol.c:218/call_bail()] client/protocol:bailing transport
...
...
[1:48:22] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write to break on blocked socket (if any) [1:48:22] [CRITICAL/client-protocol.c:218/call_bail()] client/protocol:bailing transport [1:48:22] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write to break on blocked socket (if any) [1:48:22] [DEBUG/client-protocol.c:2708/client_protocol_interpret()] protocol/client:frame not found for blk with callid: 139893 [1:48:22] [DEBUG/client-protocol.c:2605/client_protocol_cleanup()] protocol/client:cleaning up state in transport object 0x16595730 [1:48:22] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:client1: connection to server disconnected [1:48:22] [CRITICAL/common-utils.c:215/gf_print_trace()] debug-backtrace:Got signal (11), printing backtrace [1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:/usr/lib/libglusterfs.so.0(gf_print_trace+0x21) [0x2aaaaaccf4a1] [1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:/lib64/libc.so.6 [0x2aaaab53b070] [1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:/usr/lib/glusterfs/1.3.0-pre4/xlator/performance/read-ahead.so(ra_frame_return+0x142) [0x2aaaac4da2a2] [1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:/usr/lib/glusterfs/1.3.0-pre4/xlator/performance/read-ahead.so [0x2aaaac4d9daa] [1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:[glusterfs] [0x40910b] [1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:/usr/lib64/libfuse.so.2 [0x2aaaaaee3059] [1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:[glusterfs] [0x402f29] [1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:/usr/lib/libglusterfs.so.0(sys_epoll_iteration+0xd4) [0x2aaaaacd0ef4] [1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:[glusterfs] [0x402898] [1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:/lib64/libc.so.6(__libc_start_main+0xf4) [0x2aaaab5288a4] [1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:[glusterfs] [0x4025f9]

The server configuration files are like that :
-------------------
volume brick
       type storage/posix
       option directory /var/gldata
end-volume

volume server
       type protocol/server
       option transport-type tcp/server
       option listen-port 6996
       option bind-address 29.11.276.x
       subvolumes brick
       option auth.ip.brick.allow *
end-volume
-------------------

The client configuration file is :
-------------------
volume clientX            #{1,2,3}
type protocol/client
option transport-type tcp/client
option remote-host A.B.C.X
option remote-port 6996
option remote-subvolume brick
end-volume


### Add AFR feature to brick
volume afr
 type cluster/afr
 subvolumes client1 client2 client3
 option replicate *:3                 # All files 3 copies
end-volume

#volume stripe
#   type cluster/stripe
#   subvolumes client1 client2 client3
#   option block-size *:256kB
#end-volume

#volume trace
#  type debug/trace
#  subvolumes afr
#  option debug on
#end-volume

volume writebehind
  type performance/write-behind
  option aggregate-size 131072 # aggregate block size in bytes
  subvolumes afr
end-volume

volume readahead
  type performance/read-ahead
  option page-size 131072 ### size in bytes
option page-count 16 ### page-size x page-count is the amount of read-ahead data per file
  subvolumes writebehind
end-volume
-------------------






[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux