Hi All, We are testing an erasure coded ceph cluster fronted by rados gateway. Recently many osds are going down due to out-of-memory. Here are the details. Description of the cluster: ==================== ceph version 0.94.2 (hammer) 32 hosts, 6 disks (osds) per host so 32*6 = 192 osds 17024 pgs, 15 pools, 107 TB data, 57616 kobjects 167 TB used, 508 TB / 675 TB available erasure coding reed-solomon-van with k=10, m=5 We are using rgw as client. erasure coded only for the .rgw.buckets pool rest of the rgw metadata/index pools are replicated with size=3 The Problem ========== We ran a load test against this cluster. The load test simply writes 4.5 MB sized objects through a locust test cluster. We observed very low throughput with saturation on disk iops. We reasoned that this is because rgw stripe width is 4MB, which results in the osds splitting it into 4MB/k = 400kb chunks, which leads to random io behaviour. To mitigate this, we changed rgw stripe width to 40 MB (so that, after chunking, the object sizes become 40/k = 4MB) and we modified the load test to upload 40 MB objects. Now we observed a more serious problem. A lot of OSDs across different hosts started getting killed by OOM killer. We saw that the memory usage of OSDs were huge. ~10G per OSDs. For comparison, we have a different replicated cluster with a lot more data where OSD memory usage is ~600MB. At this point, we stopped the load test, and tried to restart the individual OSDs. Even without load, the OSD memory size grows to ~11G We ran the tcmalloc heap profiler against an OSD. Here is the graph generated by google-pprof. http://s33.postimg.org/5w48sr3an/mygif.gif The graph seems to indicate that most of the memory is being allocated by PGLog::readLog Is this expected behaviour? Is there some setting that allows us to control this? Please let us know further steps we can take to fix the problem. Thanks Sharath -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html