Hi I feel unlucky with release-3.3. Adding a pair of brick in a replicated volume crashes a client that is using the volume. Client log is attached. Here is glusterfsd bbacktrace in gdb: Program terminated with signal 11, Segmentation fault. #0 0xbbbc239c in synctask_wrap (old_task=0xbb711000) at syncop.c:120 120 task->ret = task->syncfn (task->opaque); (gdb) bt #0 0xbbbc239c in synctask_wrap (old_task=0xbb711000) at syncop.c:120 #1 0xbb8ccbe0 in swapcontext () from /lib/libc.so.12 Backtrace stopped: Not enough registers or memory available to unwind further (gdb) print task $1 = (struct synctask *) 0x0 This means pthread_getspecific(synctask_key) in synctask_get() returned NULL, something I cannot explain. I see the logs complain about a volume being down. This may be the cause of the problem since I have been able to do a live brick-add later, once I restarterd glusterd/glusterfsd on all bricks. -- Emmanuel Dreyfus manu@xxxxxxxxxx
[2012-08-03 06:11:13.853900] I [glusterfsd-mgmt.c:64:mgmt_cbk_spec] 0-mgmt: Volume file changed [2012-08-03 06:11:14.991707] I [io-cache.c:1549:check_cache_size_ok] 1-gfs-quick-read: Max cache size is 18446744069951455232 [2012-08-03 06:11:14.991834] I [io-cache.c:1549:check_cache_size_ok] 1-gfs-io-cache: Max cache size is 18446744069951455232 [2012-08-03 06:11:15.056753] I [client.c:2142:notify] 1-gfs-client-0: parent translators are ready, attempting connect on transport [2012-08-03 06:11:15.059123] I [client.c:2142:notify] 1-gfs-client-1: parent translators are ready, attempting connect on transport [2012-08-03 06:11:15.061329] I [client.c:2142:notify] 1-gfs-client-2: parent translators are ready, attempting connect on transport [2012-08-03 06:11:15.063623] I [client.c:2142:notify] 1-gfs-client-3: parent translators are ready, attempting connect on transport Given volfile: +------------------------------------------------------------------------------+ 1: volume gfs-client-0 2: type protocol/client 3: option remote-host silo 4: option remote-subvolume /export/wd3a 5: option transport-type tcp 6: end-volume 7: 8: volume gfs-client-1 9: type protocol/client 10: option remote-host hangar 11: option remote-subvolume /export/wd3a 12: option transport-type tcp 13: end-volume 14: 15: volume gfs-client-2 16: type protocol/client 17: option remote-host hangar 18: option remote-subvolume /export/wd1a 19: option transport-type tcp 20: end-volume 21: 22: volume gfs-client-3 23: type protocol/client 24: option remote-host hotstuff 25: option remote-subvolume /export/wd1a 26: option transport-type tcp 27: end-volume 28: 29: volume gfs-replicate-0 30: type cluster/replicate 31: subvolumes gfs-client-0 gfs-client-1 32: end-volume 33: 34: volume gfs-replicate-1 35: type cluster/replicate 36: subvolumes gfs-client-2 gfs-client-3 37: end-volume 38: 39: volume gfs-dht 40: type cluster/distribute 41: subvolumes gfs-replicate-0 gfs-replicate-1 42: end-volume 43: 44: volume gfs-write-behind 45: type performance/write-behind 46: subvolumes gfs-dht 47: end-volume 48: 49: volume gfs-read-ahead 50: type performance/read-ahead 51: subvolumes gfs-write-behind 52: end-volume 53: 54: volume gfs-io-cache 55: type performance/io-cache 56: subvolumes gfs-read-ahead 57: end-volume 58: 59: volume gfs-quick-read 60: type performance/quick-read 61: subvolumes gfs-io-cache 62: end-volume 63: 64: volume gfs-md-cache 65: type performance/md-cache 66: subvolumes gfs-quick-read 67: end-volume 68: 69: volume gfs 70: type debug/io-stats 71: option latency-measurement off 72: option count-fop-hits off 73: subvolumes gfs-md-cache 74: end-volume +------------------------------------------------------------------------------+ [2012-08-03 06:11:15.070451] I [rpc-clnt.c:1660:rpc_clnt_reconfig] 1-gfs-client-0: changing port to 24010 (from 0) [2012-08-03 06:11:16.240641] E [client-handshake.c:1717:client_query_portmap_cbk] 1-gfs-client-3: failed to get the port number for remote subvolume [2012-08-03 06:11:16.240890] I [client.c:2090:client_rpc_notify] 1-gfs-client-3: disconnected [2012-08-03 06:11:16.533693] E [client-handshake.c:1717:client_query_portmap_cbk] 1-gfs-client-2: failed to get the port number for remote subvolume [2012-08-03 06:11:16.533963] I [rpc-clnt.c:1660:rpc_clnt_reconfig] 1-gfs-client-1: changing port to 24010 (from 0) [2012-08-03 06:11:16.534166] I [client.c:2090:client_rpc_notify] 1-gfs-client-2: disconnected [2012-08-03 06:11:16.534363] E [afr-common.c:3664:afr_notify] 1-gfs-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up. [2012-08-03 06:11:18.609639] I [client-handshake.c:1636:select_server_supported_programs] 1-gfs-client-0: Using Program GlusterFS 3.3git, Num (1298437), Version (330) [2012-08-03 06:11:18.613869] I [client-handshake.c:1433:client_setvolume_cbk] 1-gfs-client-0: Connected to 192.0.2.99:24010, attached to remote volume '/export/wd3a'. [2012-08-03 06:11:18.614028] I [client-handshake.c:1445:client_setvolume_cbk] 1-gfs-client-0: Server and Client lk-version numbers are not same, reopening the fds [2012-08-03 06:11:18.614483] I [afr-common.c:3627:afr_notify] 1-gfs-replicate-0: Subvolume 'gfs-client-0' came back up; going online. [2012-08-03 06:11:18.615776] I [client-handshake.c:453:client_set_lk_version_cbk] 1-gfs-client-0: Server lk version = 1 [2012-08-03 06:11:19.625116] I [rpc-clnt.c:1660:rpc_clnt_reconfig] 1-gfs-client-2: changing port to 24011 (from 0) [2012-08-03 06:11:19.626509] I [client-handshake.c:1636:select_server_supported_programs] 1-gfs-client-1: Using Program GlusterFS 3.3git, Num (1298437), Version (330) [2012-08-03 06:11:19.627991] I [rpc-clnt.c:1660:rpc_clnt_reconfig] 1-gfs-client-3: changing port to 24009 (from 0) [2012-08-03 06:11:19.628392] I [client-handshake.c:1433:client_setvolume_cbk] 1-gfs-client-1: Connected to 192.0.2.98:24010, attached to remote volume '/export/wd3a'. [2012-08-03 06:11:19.628606] I [client-handshake.c:1445:client_setvolume_cbk] 1-gfs-client-1: Server and Client lk-version numbers are not same, reopening the fds [2012-08-03 06:11:19.664120] I [fuse-bridge.c:4193:fuse_graph_setup] 0-fuse: switched to graph 1 [2012-08-03 06:11:19.665868] I [client-handshake.c:453:client_set_lk_version_cbk] 1-gfs-client-1: Server lk version = 1 [2012-08-03 06:11:19.669841] I [afr-common.c:1964:afr_set_root_inode_on_first_lookup] 1-gfs-replicate-0: added root inode [2012-08-03 06:11:19.671492] I [dht-layout.c:593:dht_layout_normalize] 1-gfs-dht: found anomalies in /. holes=1 overlaps=0 [2012-08-03 06:11:19.672057] W [dht-selfheal.c:875:dht_selfheal_directory] 1-gfs-dht: 1 subvolumes down -- not fixing pending frames: patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2012-08-03 06:11:19 configuration details: dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 spinlock 1 extattr.h 1 xattr.h 1 st_atimespec.tv_nsec 1 package-string: glusterfs 3.3git