GlusterFS 1.3 mainlin 2.5-patch-317 crashes when second afr-node goes down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I checked out the latest source and compiled it with debian testing.
I tried the setup with 2 Servers as described on http://www.gluster.org/docs/index.php/GlusterFS_High_Availability_Storage_with_GlusterFS
(thanks to Paul England).

Attached you can find server and client configuration.
- I use 3 Servers with debian testing installed
- fuse version:
ii fuse-utils 2.6.5-1 Filesystem in USErspace (utilities) ii libfuse-dev 2.6.5-1 Filesystem in USErspace (development files) ii libfuse2 2.6.5-1 Filesystem in USErspace library
fuse init (API version 7.8)
- 2 servers for Storage
- 1 server as a client

Everything works until I shutdown the second storage server, and try to write a file or do some ls on the client the "glusterfd" on server one will crash and the client gives me the error message:

ls: /mnt/gluster/data1/: Transport endpoint is not connected
ls: /mnt/gluster/data1/: Transport endpoint is not connected

The log from server 1 is attached (error-server1.txt)

Have you any idea where the error could be?
If you need further information please let me know.

Thanks and regards
Urban Loesch
2007-07-18 16:50:14 E [tcp-client.c:170:tcp_connect] data1-gluster2-ds: non-blocking connect() returned: 111 (Connection refused)
2007-07-18 16:50:14 W [client-protocol.c:340:client_protocol_xfer] data1-gluster2-ds: not connected at the moment to submit frame type(0) op(22)
2007-07-18 16:50:14 D [tcp-client.c:70:tcp_connect] data1-gluster2-ds: socket fd = 3
2007-07-18 16:50:14 D [tcp-client.c:88:tcp_connect] data1-gluster2-ds: finalized on port `1023'
2007-07-18 16:50:14 D [tcp-client.c:109:tcp_connect] data1-gluster2-ds: defaulting remote-port to 6996
2007-07-18 16:50:14 D [tcp-client.c:141:tcp_connect] data1-gluster2-ds: connect on 3 in progress (non-blocking)
2007-07-18 16:50:14 E [tcp-client.c:170:tcp_connect] data1-gluster2-ds: non-blocking connect() returned: 111 (Connection refused)
2007-07-18 16:50:14 W [client-protocol.c:340:client_protocol_xfer] data1-gluster2-ds: not connected at the moment to submit frame type(0) op(20)
2007-07-18 16:50:14 E [afr.c:576:afr_getxattr_cbk] data1-ds-afr: (path=) op_ret=-1 op_errno=107
2007-07-18 16:50:14 C [common-utils.c:208:gf_print_trace] debug-backtrace: Got signal (11), printing backtrace
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(gf_print_trace+0x2b) [0xb7fc44fb]
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: /lib/i686/cmov/libc.so.6 [0xb7e7e208]
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.pre6/xlator/protocol/server.so [0xb7607e1f]
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0 [0xb7fc32a7]
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.pre6/xlator/cluster/unify.so [0xb762410c]
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.pre6/xlator/cluster/afr.so [0xb762cf4c]
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.pre6/xlator/protocol/client.so [0xb7645da2]
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.pre6/xlator/protocol/client.so [0xb763f160]
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.pre6/xlator/protocol/client.so [0xb7640c9c]
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.pre6/xlator/cluster/afr.so [0xb762d0da]
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.pre6/xlator/cluster/unify.so(unify_getxattr+0x140) [0xb7624253]
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(default_getxattr+0xe1) [0xb7fc338f]
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.pre6/xlator/protocol/server.so [0xb760ce6f]
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0 [0xb7fcc151]
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(call_resume+0x33) [0xb7fce6b5]
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.pre6/xlator/protocol/server.so [0xb760d067]
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.pre6/xlator/protocol/server.so [0xb76114f1]
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.pre6/xlator/protocol/server.so(notify+0xc9) [0xb7611e8b]
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(transport_notify+0x62) [0xb7fc631f]
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0 [0xb7fc6a28]
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(sys_epoll_iteration+0x147) [0xb7fc6d0c]
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(poll_iteration+0x1d) [0xb7fc654c]
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: [glusterfsd] [0x8049340]
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: /lib/i686/cmov/libc.so.6(__libc_start_main+0xe0) [0xb7e6a030]
2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace: [glusterfsd] [0x8048d51]
   ### Add client feature and attach to remote subvolume
   volume gluster1
     type protocol/client
     option transport-type tcp/client     # for TCP/IP transport
     option remote-host 10.137.252.137       # IP address of the remote brick
     option remote-subvolume data1        # name of the remote volume
   end-volume
 
   volume gluster2
     type protocol/client
     option transport-type tcp/client     # for TCP/IP transport
     option remote-host 10.137.252.138    # IP address of the remote brick
     option remote-subvolume data1        # name of the remote volume
   end-volume
 
   ### Add writeback feature
   volume writeback
     type performance/write-behind
     option aggregate-size 131072 # unit in bytes
     subvolumes gluster1
   end-volume
 
   ### Add readahead feature
   volume readahead
     type performance/read-ahead
     option page-size 65536     # unit in bytes
     option page-count 16       # cache per file  = (page-count x page-size)
     subvolumes writeback
   end-volume
   volume data1-ds
           type storage/posix                   	# POSIX FS translator
           option directory /glusterfs/data1    	# Export this directory
   end-volume
 
   volume data1-ns
           type storage/posix                   	# POSIX FS translator
           option directory /glusterfs/namespace1       # Export this directory
   end-volume
 
   volume data1-gluster1-ds
           type protocol/client
           option transport-type tcp/client
           option remote-host 127.0.0.1
           option remote-subvolume data1-ds
   end-volume
 
   volume data1-gluster1-ns
           type protocol/client
           option transport-type tcp/client
           option remote-host 127.0.0.1
           option remote-subvolume data1-ns
   end-volume
 
   volume data1-gluster2-ds
           type protocol/client
           option transport-type tcp/client
           option remote-host 192.168.0.138
           option remote-subvolume data1-ds
   end-volume
 
   volume data1-gluster2-ns
           type protocol/client
           option transport-type tcp/client
           option remote-host 192.168.0.138
           option remote-subvolume data1-ns
   end-volume

# Add AFR to Datastorage
   volume data1-ds-afr
           type cluster/afr
           # There appears to be a bug with AFR and Local Posix Volumes.
           # To get around this we pretend the local volume is remote with an extra client volume named mailspool-santa1-ds.
           subvolumes data1-gluster1-ds data1-gluster2-ds
           option replicate *:2
   end-volume

# Add AFR to Namespacestorage
   volume data1-ns-afr
           type cluster/afr
           # There appears to be a bug with AFR and Local Posix Volumes.
           # To get around this we pretend the local volume is remote with an extra client volume named mailspool-santa1-ns.
           # subvolumes mailspool-ns mailspool-santa2-ns mailspool-santa3-ns
           subvolumes data1-gluster1-ns data1-gluster2-ns
           option replicate *:2
   end-volume
 
# Unify
   volume data1-unify
           type cluster/unify
           subvolumes data1-ds-afr
           option namespace data1-ns-afr
           option scheduler rr
   end-volume

# Performance
   volume data1
           type performance/io-threads
           option thread-count 8
           option cache-size 64MB
           subvolumes data1-unify
   end-volume
 
   ### Add network serving capability to above brick.
   volume server
     type protocol/server
     option transport-type tcp/server     # For TCP/IP transport
     subvolumes data1
     option auth.ip.data1-ds.allow 192.168.0.*,127.0.0.1 # Allow access to "brick" volume
     option auth.ip.data1-ns.allow 192.168.0.*,127.0.0.1 # Allow access to "brick" volume
     option auth.ip.data1.allow * # Allow access to "brick" volume
   end-volume

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux