On 08/03/2012 02:23 PM, Emmanuel Dreyfus wrote:
On Fri, Aug 03, 2012 at 05:13:02AM +0000, Emmanuel Dreyfus wrote:
It seems there is a race condition here. Someone knowledgable can
confirm?
I tried this. It does not crash anymore, but the volume gets broken
with lookups returning EINVAL (log below), it's therefore probably the
wrong way, but hints are welcome.
--- syncop.c.orig 2012-08-03 08:02:35.000000000 +0200
+++ syncop.c 2012-08-03 10:43:28.000000000 +0200
@@ -116,8 +116,10 @@
/* Do not trust the pointer received. It may be
wrong and can lead to crashes. */
task = synctask_get ();
+ assert(task != NULL);
+
task->ret = task->syncfn (task->opaque);
if (task->synccbk)
task->synccbk (task->ret, task->frame, task->opaque);
@@ -211,8 +213,14 @@
newtask->ctx.uc_stack.ss_sp = newtask->stack;
newtask->ctx.uc_stack.ss_size = env->stacksize;
+ /*
+ * synctask_wrap does not trust its argument, and
+ * uses syntask_get()
+ */
+ synctask_set (newtask);
+
makecontext (&newtask->ctx, (void *) synctask_wrap, 2, newtask);
newtask->state = SYNCTASK_INIT;
[2012-08-03 10:46:03.709177] E [afr-common.c:3664:afr_notify] 0-pfs-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2012-08-03 10:46:03.825505] W [dht-layout.c:186:dht_layout_search] 1-pfs-dht: no subvolume for hash (value) = 4177819066
[2012-08-03 10:46:03.825652] E [dht-common.c:1372:dht_lookup] 1-pfs-dht: Failed to get hashed subvol for /manu
[2012-08-03 10:46:03.826315] W [fuse-bridge.c:292:fuse_entry_cbk] 0-glusterfs-fuse: 12944: LOOKUP() /manu => -1 (Invalid argument)
[2012-08-03 10:46:03.827107] W [dht-layout.c:186:dht_layout_search] 1-pfs-dht: no subvolume for hash (value) = 4177819066
Looking at the logs, I feel its because of bug 815227, can you run a
'rebalance' operation and see if everything comes to normal?
-Amar