> > You might also want to just look at read_message, connect, and accept in > Pipe.cc as I think those are the only places where data is read off the > network into a buffer/struct on the stack. > After adding the following to the [client] section of the config file the problem seems to have gone away, or at least I haven't been able to reproduce it where previously I had figured out a way to reproduce it reliably: [client] ms rwthread stack bytes = 8388608 It seems that ceph uses a stack size of 1M in the absence of the above. I'm not sure if the original problem was a stack overflow and this fixes it, or if I'm just working around it by making the stack bigger... Previously I'd created a wrapper around pthread_cond_wait, and had that wrapper: . allocate a large (256KB when stack size was 1MB) array on the stack . fill that array with increasing values (1, 2, 3, etc) . protect the memory pages (whole pages only) in that array with mprotect . call pthread_cond_wait proper . unprotect the pages . check the array to make sure it was still intact It appears to pretty much always be the writer wait thread whose stack was being corrupted on wait, but nothing ever tripped up the above, and a lot of the time the above would itself crash half way through filling up the structure, always on a page boundary. Does gcc use a guard page on the stack? James -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html