On Tue, Sep 16, 2014 at 6:15 PM, Joao Eduardo Luis <joao.luis at inktank.com> wrote: > Forcing the monitor to compact on start and restarting the mon is the > current workaround for overgrown ssts. This happens on a regular basis with > some clusters and I've not been able to track down the source. It seems > that leveldb keeps hold of previous, useless data and will only relinquish > it upon being closed. > > When this happens monitors do not slow down per se but they tend to > misbehave: hanging at times, spurious elections, flapping quorum. This is > mostly because, up until recently, the monitors would wait on updates to be > written to leveldb. Leveldb would in turn misbehave as (afaict) it's busy > dealing with clutter. Sage pushed patches to master to have the monitor > performing async writes to leveldb so to prevent the monitor hanging when > leveldb hangs, and this should help quite a bit with all the weirdness. > > So to recap: current workaround is add 'mon compact on start = true' on your > ceph.conf and restart the monitor. Obrigado Jo?o, that did bring the ssts down to a bearable size in this instance. This turned out to not be the root cause of the weird perf issues seen on that cluster, but at least it cleaned up the mons. :) Cheers, Florian