On Thu, Jun 2, 2016 at 2:23 PM, Sharath Gururaj <sharath.g@xxxxxxxxxxxx> wrote: > Hi All, > > We are testing an erasure coded ceph cluster fronted by rados gateway. > Recently many osds are going down due to out-of-memory. > Here are the details. > > Description of the cluster: > ==================== > ceph version 0.94.2 (hammer) Sharath, have you tried the latest hammer (v0.94.7)? does it also have this issue? > 32 hosts, 6 disks (osds) per host so 32*6 = 192 osds > 17024 pgs, 15 pools, 107 TB data, 57616 kobjects > 167 TB used, 508 TB / 675 TB available > erasure coding reed-solomon-van with k=10, m=5 > We are using rgw as client. erasure coded only for the .rgw.buckets pool > rest of the rgw metadata/index pools are replicated with size=3 > > The Problem > ========== > We ran a load test against this cluster. The load test simply writes > 4.5 MB sized objects through a locust test cluster. > We observed very low throughput with saturation on disk iops. > We reasoned that this is because rgw stripe width is 4MB, > which results in the osds splitting it into 4MB/k = 400kb chunks, > which leads to random io behaviour. > > To mitigate this, we changed rgw stripe width to 40 MB (so that, after > chunking, the object sizes become 40/k = 4MB) and we modified the load > test to upload 40 MB objects. > > Now we observed a more serious problem. > A lot of OSDs across different hosts started getting killed by OOM killer. > We saw that the memory usage of OSDs were huge. ~10G per OSDs. > For comparison, we have a different replicated cluster with a lot more > data where OSD memory usage is ~600MB. > > At this point, we stopped the load test, and tried to restart the > individual OSDs. > Even without load, the OSD memory size grows to ~11G > > We ran the tcmalloc heap profiler against an OSD. Here is the graph > generated by google-pprof. > http://s33.postimg.org/5w48sr3an/mygif.gif > > > The graph seems to indicate that most of the memory is being allocated > by PGLog::readLog > Is this expected behaviour? Is there some setting that allows us to > control this? > > Please let us know further steps we can take to fix the problem. > > Thanks > Sharath > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Regards Kefu Chai -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html