Re: pre6 hanging problems

"Anand Avati" <avati@xxxxxxxxxxxxx> · Thu, 26 Jul 2007 09:44:31 +0530

August,
It seems to me that you were running the client in GDB, and for some reason
that particular client bailed out. While bailing out the client raises
SIGCONT which has been caught by gdb (gdb catches all signals before letting
the signal handlers take over). the backtrace you have attached is NOT a
crash, you had to just 'c' (continue) at the gdb. And most likely, this is
what has given the 'hung' effect as well.
Is this reproducible for you?

thanks,
avati

2007/7/26, August R. Wohlt <glusterfs@xxxxxxxxxxx>:

Hi all -

I have client and server set up with the pre6 version of gluserfs. Several
times a day the client mount will freeze up as does any command that tries
to read from the mountpoint. I have to kill the glusterfs process, unmount
the directory and remount it to get it to work again.

When this happens, there is another glusterfs client on other machines
connected to the same server that does not get disconnected. So the
timeout
message in the logs is confusing to me. If it's really timing out wouldn't
the other server be disconnected, too?

This is on CentOS 5 with fuse 2.7.0-glfs.

When it happens, here's what shows up in the client:

...
2007-07-25 09:45:59 D [inode.c:327:__active_inode] fuse/inode: activating
inode(4210807), lru=0/1024
2007-07-25 09:45:59 D [inode.c:285:__destroy_inode] fuse/inode: destroy
inode(4210807)
2007-07-25 12:37:26 W [client-protocol.c:211:call_bail] brick: activating
bail-out. pending frames = 1. last sent =
2007-07-25 12:33:42. last received = 2007-07-25 11:42:59 transport-timeout
=
120
2007-07-25 12:37:26 C [client-protocol.c:219:call_bail] brick: bailing
transport
2007-07-25 12:37:26 W [client-protocol.c:4189:client_protocol_cleanup]
brick: cleaning up state in transport object
0x80a03d0
2007-07-25 12:37:26 W [client-protocol.c:4238:client_protocol_cleanup]
brick: forced unwinding frame type(0) op(15)
2007-07-25 12:37:26 C [tcp.c:81:tcp_disconnect] brick: connection
disconnected

When it happens, here's what shows up in the server:

2007-07-25 15:37:40 E [protocol.c:346:gf_block_unserialize_transport]
libglusterfs/protocol: full_read of block failed: peer (
192.168.2.3:1023)
2007-07-25 15:37:40 C [tcp.c:81:tcp_disconnect] server: connection
disconnected
2007-07-25 15:37:40 E [protocol.c:251:gf_block_unserialize_transport]
libglusterfs/protocol: EOF from peer (192.168.2.4:1023)
2007-07-25 15:37:40 C [tcp.c:81:tcp_disconnect] server: connection
disconnected

And here's the client backtrace:

(gdb) bt
#0  0x0032e7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x005a3824 in raise () from /lib/tls/libpthread.so.0
#2  0x00655b0c in tcp_bail (this=0x80a03d0) at
../../../../transport/tcp/tcp.c:146
#3  0x00695bbc in transport_bail (this=0x80a03d0) at transport.c:192
#4  0x00603a16 in call_bail (trans=0x80a03d0) at client-protocol.c:220
#5  0x00696870 in gf_timer_proc (ctx=0xbffeec30) at timer.c:119
#6  0x0059d3cc in start_thread () from /lib/tls/libpthread.so.0
#7  0x00414c3e in clone () from /lib/tls/libc.so.6

client config:

### Add client feature and attach to remote subvolume
volume brick
   type protocol/client
   option transport-type tcp/client     # for TCP/IP transport
   option remote-host 192.168.2.5       # IP address of the remote brick
   option remote-subvolume brick_1  # name of the remote volume
end-volume

# #### Add writeback feature
  volume brick-wb
    type performance/write-behind
    option aggregate-size 131072 # unit in bytes
    subvolumes brick
  end-volume

server config:

### Export volume "brick" with the contents of "/home/export" directory.
volume brick_1
   type storage/posix
   option directory /home/vg_3ware1/vivalog/brick_1
end-volume

volume brick_2
   type storage/posix
   option directory /home/vg_3ware1/vivalog/brick_2
end-volume

### Add network serving capability to above brick.
volume server
   type protocol/server
   option transport-type tcp/server     # For TCP/IP transport
   option bind-address 192.168.2.5     # Default is to listen on all
interfaces
   subvolumes brick_1
   option auth.ip.brick_2.allow * # Allow access to "brick" volume
   option auth.ip.brick_1.allow * # Allow access to "brick" volume
end-volume

ps I have one server serving two volume bricks to two physically distinct
clients.  I assume this is okay--that I don't need to have two separate
server declarations.
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
http://lists.nongnu.org/mailman/listinfo/gluster-devel

--
Anand V. Avati