Re: Random errors with "Transport endpoint is not connected"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Constantin,
 The random 'disconnection' is a fixed issue and the fix is available in
the checkout with 'tla get glusterfs--mainline--2.4'.
  W.R.T to running databases over glusterfs, we have not performed
extensive analysis of this scenario. As a crude analysis, i would presume
databases would open the files with O_DIRECTIO which disables all
performance enhancements.
  I am sure glusterfs is pretty much untuned for running databases over it
at this stage, though it should 'work'. With some feedback from the users
who use glusterfs for databases it would be possible to tune it further to
give best performance.
 GlusterFS aims at becoming 'best suited' for all types of applications,
since it is easy with the translator design (you could load an alternative
translator which has code tuned for delivering best performance for that
application) and database is definitely one of the appliations we are
looking at in the long run.

thanks!
avati

2007/6/24, Constantin Teodorescu <teo@xxxxxxx>:

Hi all,
first of all, glusterfs is a nice idea, liked it a lot, but it need to
be a rock-solid product so I made some tests , hope that this bug-report
will help you.

Configuration:
3 identical servers and a client, connected with TCP/IP
Servers: Mandrake 9.x (compiled, installed without problem) , client
CentOS-5 x86_64, fuse 2.6.5 compiled, installed from source, everything
went OK.

Those 3 servers provide 3 simple bricks joined at the client in full
mirror ( afr x 3) configuration + read-ahead & writebehind translators.
I made a PostgreSQL tablespace (zone) on the mounted /mnt/gfs and copied
there a 50 Mb table then stress-it with various operations.
10 read and full updates on every row in the table succeeded.

After a while, a simple "vacuum full analyze" gave the error:
glu=# vacuum full analyze;
ERROR:  could not read block 43155 of relation
527933664/527933665/527933666: Transport endpoint is not connected

I repeated the tests many times, after 2,3  minutes of operation, I got
the same error, in another place but mostly in WRITE operations.
I deleted the whole mounted client disk and rebuild it with STRIPE
option instead of AFR translator.
The behaviour is the same ... after a couple of succeded operations, I
got a failure.

That were the facts, now ... the logs and configuration files.

The client debug log shows this errors:
[1:48:22] [CRITICAL/client-protocol.c:218/call_bail()]
client/protocol:bailing transport
[1:48:22] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write to
break on blocked socket (if any)
[1:48:22] [CRITICAL/client-protocol.c:218/call_bail()]
client/protocol:bailing transport
[1:48:22] [ERROR/common-utils.c:110/full_rwv()] libglusterfs:full_rwv:
6689 bytes r/w instead of 8539 (Broken pipe)
[1:48:22] [ERROR/client-protocol.c:204/client_protocol_xfer()]
protocol/client:transport_submit failed
[1:48:22] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write to
break on blocked socket (if any)
[1:48:22] [CRITICAL/client-protocol.c:218/call_bail()]
client/protocol:bailing transport
...
...
[1:48:22] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write to
break on blocked socket (if any)
[1:48:22] [CRITICAL/client-protocol.c:218/call_bail()]
client/protocol:bailing transport
[1:48:22] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write to
break on blocked socket (if any)
[1:48:22] [DEBUG/client-protocol.c:2708/client_protocol_interpret()]
protocol/client:frame not found for blk with callid: 139893
[1:48:22] [DEBUG/client-protocol.c:2605/client_protocol_cleanup()]
protocol/client:cleaning up state in transport object 0x16595730
[1:48:22] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:client1:
connection to server disconnected
[1:48:22] [CRITICAL/common-utils.c:215/gf_print_trace()]
debug-backtrace:Got signal (11), printing backtrace
[1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()]
debug-backtrace:/usr/lib/libglusterfs.so.0(gf_print_trace+0x21)
[0x2aaaaaccf4a1]
[1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()]
debug-backtrace:/lib64/libc.so.6 [0x2aaaab53b070]
[1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()]
debug-backtrace:/usr/lib/glusterfs/1.3.0-pre4/xlator/performance/read-
ahead.so(ra_frame_return+0x142)
[0x2aaaac4da2a2]
[1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()]
debug-backtrace:/usr/lib/glusterfs/1.3.0-pre4/xlator/performance/read-
ahead.so
[0x2aaaac4d9daa]
[1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()]
debug-backtrace:[glusterfs] [0x40910b]
[1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()]
debug-backtrace:/usr/lib64/libfuse.so.2 [0x2aaaaaee3059]
[1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()]
debug-backtrace:[glusterfs] [0x402f29]
[1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()]
debug-backtrace:/usr/lib/libglusterfs.so.0(sys_epoll_iteration+0xd4)
[0x2aaaaacd0ef4]
[1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()]
debug-backtrace:[glusterfs] [0x402898]
[1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()]
debug-backtrace:/lib64/libc.so.6(__libc_start_main+0xf4) [0x2aaaab5288a4]
[1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()]
debug-backtrace:[glusterfs] [0x4025f9]

The server configuration files are like that :
-------------------
volume brick
        type storage/posix
        option directory /var/gldata
end-volume

volume server
        type protocol/server
        option transport-type tcp/server
        option listen-port 6996
        option bind-address 29.11.276.x
        subvolumes brick
        option auth.ip.brick.allow *
end-volume
-------------------

The client configuration file is :
-------------------
volume clientX            #{1,2,3}
type protocol/client
option transport-type tcp/client
option remote-host A.B.C.X
option remote-port 6996
option remote-subvolume brick
end-volume


### Add AFR feature to brick
volume afr
  type cluster/afr
  subvolumes client1 client2 client3
  option replicate *:3                 # All files 3 copies
end-volume

#volume stripe
#   type cluster/stripe
#   subvolumes client1 client2 client3
#   option block-size *:256kB
#end-volume

#volume trace
#  type debug/trace
#  subvolumes afr
#  option debug on
#end-volume

volume writebehind
   type performance/write-behind
   option aggregate-size 131072 # aggregate block size in bytes
   subvolumes afr
end-volume

volume readahead
   type performance/read-ahead
   option page-size 131072 ### size in bytes
   option page-count 16 ### page-size x page-count is the amount of
read-ahead data per file
   subvolumes writebehind
end-volume
-------------------




_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
http://lists.nongnu.org/mailman/listinfo/gluster-devel




--
Anand V. Avati


[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux