Hi, We're testing ceph using a recent build from the 'next' branch (commit b40387d) and we've run into some interesting problems related to memory usage. The setup consists of 64 OSDs (4 boxes, each with 16 disks, most of them 2TB, some 1.5TB, XFS filesystems, Debian Wheezy). After the initial mkcephfs, a 'ceph -s' reports 12480 pgs total. For generating some load we used rados -p rbd bench 28000 write -t 25 and left it running overnight. After several hours most of the OSDs had eaten up around 1GB or more of memory each, which caused trashing on the servers (12GB of RAM per box), and eventually the OOM killer was invoked, killing many OSDs and even the SSH daemons. This seems to have caused a domino effect, and in the morning only around 18 of the OSD were still up. After a hard reboot of the boxes that were unresponsive, we are now in a situation in which there is simply not enough memory for the cluster to recover. That is, after restarting the OSDs, in 2 to 3 minutes we have many of them using 1~1.5GB of RAM and the trashing starts all over again, the OOM killer comes in and things go downhill again. Efectively the cluster is not able to recover no matter how many times we restart the daemons. We're not using any non-default options in the OSD section of the config. file. We checked that there is free space for logging on the system partitions. While I know that 12GB per machine can be hardly called to much RAM, the question I put forward is: is it reasonable for a OSD to consume so much memory in normal usage, or even recovery situations, when there is just around ~200 PGs per OSD and only around ~3TB of objects created by rados bench? Is there a rule of thumb to estimate the amount of memory consumed as a function of PG count, object count and perhaps the number of PGs trying to recover in a given instant? One of my concerns here is also to understand if memory consumption during recovery is bounded and deterministic at all, or if we're simply hitting a severe memory leak in the OSDs. As for the monitor daemon on this cluster (running on a dedicated machine), it is currently using 3.2GB of memory, and it got to that point again in a matter of minutes after being restarted. Would it be good if we tested with the changes from the wip-mon-leaks-fix branch? We would appreciate any advice on the best way to understand if the OSDs are leaking memory or not. We will gladly provide any config or debug info that you might be interested in, or run any tests. Thanks in advance Best regards Cláudio -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html