Hello, all: Recently, I have a plan to build a large-scale ceph cluster in production for Openstack. I want to build the cluster as larger as possible. In the following maillist, Karol has asked a question about "largest ceph cluster": http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-April/028371.html In this maillist, dreamhost and CERN said they both had build a 3-PB cluster. In the last few days, I had read the CERN's report "Ceph ~30PB Test Report" https://cds.cern.ch/record/2015206/files/CephScaleTestMarch2015.pdf In order to build such a large cluster, the guys of CERN had made some changes: 1. Set noin, noup flags before osds to be activate to avoid osdmap from changing frequently 2. Do as the following configurations, the memory consumption of OSD and monitor daemons will decrease from ~2GB to ~500MB [global] osd map message max=10 [osd] osd map cache size=20 osd map max advance=10 osd map share max epochs=10 osd pg epoch persisted max stale=10 3. ADD SSDs to monitors, because the monitors are overloaded with too many OSD creation transactions 4. Upgraded the verion of ceph to Hammer to avoid leveldb from increasing rapidly. it seems that CERN's 30-PB cluster is for test only and not yet in production environment? I wonder to know on the current situation, how large the cluster is the best fit for the production environment. 3-PB? 30-PB? or bigger? And if a large-scale cluster has been build, how to maintain such a large cluster in the latter days? What's the core issues of the large cluster, and what can we do to avoid the potential problems? _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com