----- Original Message ----- > > > ----- Original Message ----- > > > > > This seems to happen because of race between STACK_RESET and stack > > > statedump. Still thinking how to fix it without taking locks around > > > writing to file. > > > > Why should we still keep the stack being reset as part of pending pool of > > frames? Even we if we had to (can't guess why?), when we remove we should > > do > > the following to prevent gf_proc_dump_pending_frames from crashing. > > > > ... > > > > call_frame_t *toreset = NULL; > > > > LOCK (&stack->pool->lock) > > { > > toreset = stack->frames; > > stack->frames = NULL; > > } > > UNLOCK (&stack->pool->lock); > > > > ... > > > > Now, perform all operations that are done on stack->frames on toreset > > instead. Thoughts? Here is a patch does more than what is mentioned in the snippet above. http://review.gluster.com/11095. This patch makes stack to use a struct list_head for frames. This makes frames manipulation simpler and easy to reason, especially since most of us are familiar with struct list_head. Additionally, this patch fixes the race you pointed between STACK_RESET and gf_proc_dump_pending_frames. This is done by making STACK_RESET take the call_pool->lock, but not for long, just to splice the list of frames that needs to be destroyed. I checked that sparse-self-heal.t passes both on my laptop (nearly irrelevant) and on jenkins. Hope that solves this regression failure. Let me know what you think? _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel