add-brick crashes client

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

I feel unlucky with release-3.3. Adding a pair of brick in a replicated
volume crashes a client that is using the volume.

Client log is attached. Here is glusterfsd bbacktrace in gdb:

Program terminated with signal 11, Segmentation fault.
#0  0xbbbc239c in synctask_wrap (old_task=0xbb711000) at syncop.c:120
120             task->ret = task->syncfn (task->opaque);
(gdb) bt
#0  0xbbbc239c in synctask_wrap (old_task=0xbb711000) at syncop.c:120
#1  0xbb8ccbe0 in swapcontext () from /lib/libc.so.12
Backtrace stopped: Not enough registers or memory available to unwind further
(gdb) print task
$1 = (struct synctask *) 0x0

This means pthread_getspecific(synctask_key) in synctask_get() returned 
NULL, something I cannot explain. I see the logs complain about a volume
being down. This may be the cause of the problem since I have been able
to do a live brick-add later, once I restarterd glusterd/glusterfsd on 
all bricks.

-- 
Emmanuel Dreyfus
manu@xxxxxxxxxx
[2012-08-03 06:11:13.853900] I [glusterfsd-mgmt.c:64:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2012-08-03 06:11:14.991707] I [io-cache.c:1549:check_cache_size_ok] 1-gfs-quick-read: Max cache size is 18446744069951455232
[2012-08-03 06:11:14.991834] I [io-cache.c:1549:check_cache_size_ok] 1-gfs-io-cache: Max cache size is 18446744069951455232
[2012-08-03 06:11:15.056753] I [client.c:2142:notify] 1-gfs-client-0: parent translators are ready, attempting connect on transport
[2012-08-03 06:11:15.059123] I [client.c:2142:notify] 1-gfs-client-1: parent translators are ready, attempting connect on transport
[2012-08-03 06:11:15.061329] I [client.c:2142:notify] 1-gfs-client-2: parent translators are ready, attempting connect on transport
[2012-08-03 06:11:15.063623] I [client.c:2142:notify] 1-gfs-client-3: parent translators are ready, attempting connect on transport
Given volfile:
+------------------------------------------------------------------------------+
  1: volume gfs-client-0
  2:     type protocol/client
  3:     option remote-host silo
  4:     option remote-subvolume /export/wd3a
  5:     option transport-type tcp
  6: end-volume
  7: 
  8: volume gfs-client-1
  9:     type protocol/client
 10:     option remote-host hangar
 11:     option remote-subvolume /export/wd3a
 12:     option transport-type tcp
 13: end-volume
 14: 
 15: volume gfs-client-2
 16:     type protocol/client
 17:     option remote-host hangar
 18:     option remote-subvolume /export/wd1a
 19:     option transport-type tcp
 20: end-volume
 21: 
 22: volume gfs-client-3
 23:     type protocol/client
 24:     option remote-host hotstuff
 25:     option remote-subvolume /export/wd1a
 26:     option transport-type tcp
 27: end-volume
 28: 
 29: volume gfs-replicate-0
 30:     type cluster/replicate
 31:     subvolumes gfs-client-0 gfs-client-1
 32: end-volume
 33: 
 34: volume gfs-replicate-1
 35:     type cluster/replicate
 36:     subvolumes gfs-client-2 gfs-client-3
 37: end-volume
 38: 
 39: volume gfs-dht
 40:     type cluster/distribute
 41:     subvolumes gfs-replicate-0 gfs-replicate-1
 42: end-volume
 43: 
 44: volume gfs-write-behind
 45:     type performance/write-behind
 46:     subvolumes gfs-dht
 47: end-volume
 48: 
 49: volume gfs-read-ahead
 50:     type performance/read-ahead
 51:     subvolumes gfs-write-behind
 52: end-volume
 53: 
 54: volume gfs-io-cache
 55:     type performance/io-cache
 56:     subvolumes gfs-read-ahead
 57: end-volume
 58: 
 59: volume gfs-quick-read
 60:     type performance/quick-read
 61:     subvolumes gfs-io-cache
 62: end-volume
 63: 
 64: volume gfs-md-cache
 65:     type performance/md-cache
 66:     subvolumes gfs-quick-read
 67: end-volume
 68: 
 69: volume gfs
 70:     type debug/io-stats
 71:     option latency-measurement off
 72:     option count-fop-hits off
 73:     subvolumes gfs-md-cache
 74: end-volume

+------------------------------------------------------------------------------+
[2012-08-03 06:11:15.070451] I [rpc-clnt.c:1660:rpc_clnt_reconfig] 1-gfs-client-0: changing port to 24010 (from 0)
[2012-08-03 06:11:16.240641] E [client-handshake.c:1717:client_query_portmap_cbk] 1-gfs-client-3: failed to get the port number for remote subvolume
[2012-08-03 06:11:16.240890] I [client.c:2090:client_rpc_notify] 1-gfs-client-3: disconnected
[2012-08-03 06:11:16.533693] E [client-handshake.c:1717:client_query_portmap_cbk] 1-gfs-client-2: failed to get the port number for remote subvolume
[2012-08-03 06:11:16.533963] I [rpc-clnt.c:1660:rpc_clnt_reconfig] 1-gfs-client-1: changing port to 24010 (from 0)
[2012-08-03 06:11:16.534166] I [client.c:2090:client_rpc_notify] 1-gfs-client-2: disconnected
[2012-08-03 06:11:16.534363] E [afr-common.c:3664:afr_notify] 1-gfs-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.
[2012-08-03 06:11:18.609639] I [client-handshake.c:1636:select_server_supported_programs] 1-gfs-client-0: Using Program GlusterFS 3.3git, Num (1298437), Version (330)
[2012-08-03 06:11:18.613869] I [client-handshake.c:1433:client_setvolume_cbk] 1-gfs-client-0: Connected to 192.0.2.99:24010, attached to remote volume '/export/wd3a'.
[2012-08-03 06:11:18.614028] I [client-handshake.c:1445:client_setvolume_cbk] 1-gfs-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2012-08-03 06:11:18.614483] I [afr-common.c:3627:afr_notify] 1-gfs-replicate-0: Subvolume 'gfs-client-0' came back up; going online.
[2012-08-03 06:11:18.615776] I [client-handshake.c:453:client_set_lk_version_cbk] 1-gfs-client-0: Server lk version = 1
[2012-08-03 06:11:19.625116] I [rpc-clnt.c:1660:rpc_clnt_reconfig] 1-gfs-client-2: changing port to 24011 (from 0)
[2012-08-03 06:11:19.626509] I [client-handshake.c:1636:select_server_supported_programs] 1-gfs-client-1: Using Program GlusterFS 3.3git, Num (1298437), Version (330)
[2012-08-03 06:11:19.627991] I [rpc-clnt.c:1660:rpc_clnt_reconfig] 1-gfs-client-3: changing port to 24009 (from 0)
[2012-08-03 06:11:19.628392] I [client-handshake.c:1433:client_setvolume_cbk] 1-gfs-client-1: Connected to 192.0.2.98:24010, attached to remote volume '/export/wd3a'.
[2012-08-03 06:11:19.628606] I [client-handshake.c:1445:client_setvolume_cbk] 1-gfs-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2012-08-03 06:11:19.664120] I [fuse-bridge.c:4193:fuse_graph_setup] 0-fuse: switched to graph 1
[2012-08-03 06:11:19.665868] I [client-handshake.c:453:client_set_lk_version_cbk] 1-gfs-client-1: Server lk version = 1
[2012-08-03 06:11:19.669841] I [afr-common.c:1964:afr_set_root_inode_on_first_lookup] 1-gfs-replicate-0: added root inode
[2012-08-03 06:11:19.671492] I [dht-layout.c:593:dht_layout_normalize] 1-gfs-dht: found anomalies in /. holes=1 overlaps=0
[2012-08-03 06:11:19.672057] W [dht-selfheal.c:875:dht_selfheal_directory] 1-gfs-dht: 1 subvolumes down -- not fixing
pending frames:

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2012-08-03 06:11:19
configuration details:
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
spinlock 1
extattr.h 1
xattr.h 1
st_atimespec.tv_nsec 1
package-string: glusterfs 3.3git

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux