Thanks everyone for joining the list! As I mentioned in the ceph-users post, this list is for people running large Ceph clusters (>500 OSDs) to discuss issues and experiences that you don't run into with small clusters. Please try and keep conversations which are related to Ceph, but not specific to running a large cluster on the ceph-users list. Personally I run two different Ceph clusters that currently have over 1,300 OSDs each. Recently I've run into two different bugs which I believe most of us have either run into or will run into. The first issue was related to excessive OSD maps being kept for every OSD which resulted in quite a bit of each OSD's storage being wasted (I saw up to 20%). By default 500 OSD maps are stored per OSD, but I saw up to 200,000 OSD maps on some OSDs. This was made worse by the size of the clusters since the size of each OSD map was ~1 MB. Here's a link to the bug: http://tracker.ceph.com/issues/13990 Which was fixed in the 0.94.8 release, but that brings me to the next bug we ran into... When we attempted to upgrade our clusters from 0.94.6 to the 0.94.9 release we saw a huge number of slow requests at the start of the upgrade. This was caused by a change to the OSD map encoding (introduced in 0.94.7) which caused all the 0.94.6 OSDs to request full OSD maps from the 0.94.9 mon nodes instead of incremental ones. This flooded the outgoing network connection on all the mon nodes any time there was an update to the OSD map for a couple minutes at a time. This caused all sorts of problems and even resulted in an incomplete pg at one point. The link for this bug is here: http://tracker.ceph.com/issues/17386 The solution was to upgrade all the OSDs to 0.94.9 first and then upgrade the mon nodes. Now that we're on 0.94.9 things are working pretty well, but next up is to look into upgrading them to Jewel. Has anyone gone through that process and willing to share their experiences? Thanks, Bryan _______________________________________________ Ceph-large mailing list Ceph-large@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com