On Fri, Aug 03, 2012 at 05:13:02AM +0000, Emmanuel Dreyfus wrote: > It seems there is a race condition here. Someone knowledgable can > confirm? I tried this. It does not crash anymore, but the volume gets broken with lookups returning EINVAL (log below), it's therefore probably the wrong way, but hints are welcome. --- syncop.c.orig 2012-08-03 08:02:35.000000000 +0200 +++ syncop.c 2012-08-03 10:43:28.000000000 +0200 @@ -116,8 +116,10 @@ /* Do not trust the pointer received. It may be wrong and can lead to crashes. */ task = synctask_get (); + assert(task != NULL); + task->ret = task->syncfn (task->opaque); if (task->synccbk) task->synccbk (task->ret, task->frame, task->opaque); @@ -211,8 +213,14 @@ newtask->ctx.uc_stack.ss_sp = newtask->stack; newtask->ctx.uc_stack.ss_size = env->stacksize; + /* + * synctask_wrap does not trust its argument, and + * uses syntask_get() + */ + synctask_set (newtask); + makecontext (&newtask->ctx, (void *) synctask_wrap, 2, newtask); newtask->state = SYNCTASK_INIT; [2012-08-03 10:46:03.709177] E [afr-common.c:3664:afr_notify] 0-pfs-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2012-08-03 10:46:03.825505] W [dht-layout.c:186:dht_layout_search] 1-pfs-dht: no subvolume for hash (value) = 4177819066 [2012-08-03 10:46:03.825652] E [dht-common.c:1372:dht_lookup] 1-pfs-dht: Failed to get hashed subvol for /manu [2012-08-03 10:46:03.826315] W [fuse-bridge.c:292:fuse_entry_cbk] 0-glusterfs-fuse: 12944: LOOKUP() /manu => -1 (Invalid argument) [2012-08-03 10:46:03.827107] W [dht-layout.c:186:dht_layout_search] 1-pfs-dht: no subvolume for hash (value) = 4177819066 -- Emmanuel Dreyfus manu@xxxxxxxxxx