Follow up on the discussion on IRC late last night: On x86_64 2.6 kernels, pthread_create seems to allocate by default 8196KB stack size for the newly created thread. Since there can be potentially a large number of SimpleMessenger::Pipe instances (for example, when there are many OSDs and they need to heartbeat each other) and each instance has a reader and writer thread, a system can quickly run out of available memory to create new threads. A short term solution would be to decrease the amount of stack space allocated for the reader and writer threads. I guess something along the lines of: http://github.com/tcloud/ceph/commit/39ffa236f3de2082c475a5ea5edc8afa09941bd6 and http://github.com/tcloud/ceph/commit/1dbd42a5c4b064c581ddc152d41b9553f346df8a Yehudasa suggested a stacksize of 512KB, and it seems to work fine. However, as the cluster grows, there will eventually be some point where we hit a hard limit on either the number of concurrent threads or the number of concurrent tcp connections. Is it possible to redesign SimpleMessenger and/or the heartbeat mechanism so that only a constant number of connections are established? -Paul C -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html