Re: Why is my mon store.db is 220GB?

Joao Eduardo Luis <joao.luis@xxxxxxxxxxx> · Mon, 12 Aug 2013 16:49:02 -0700

Following a discussion we had today on #ceph, I've added some extra 
functionality to 'ceph-monstore-tool' to allow copying the data out of a 
store into a new mon store, and can be found on branch wip-monstore-copy.

Using it as

ceph-monstore-tool --mon-store-path <mon-data-dir> --out <mon-data-out> 
--command store-copy

with mon-data-dir being the mon data dir where the current monitor lives 
(say, /var/lib/ceph/mon/ceph-a), and mon-data-out being another 
directory.  This last directory should be empty, allowing the tool to 
create a new store, but if a store already exists it will not error out, 
copying instead the keys from the first store to the already existing 
store, so beware!

Also, should bear in mind that you must stop the monitor while doing 
this -- the tool won't work otherwise.

Anyway, this should allow you to grab all your data from the current 
monitor.  You'll be presented with a few stats when the store finishes 
being copied, and hopefully you'll see that the tool didn't copy 220GB 
worth of data -- should be considerably less!

Let me know if this works out for you.

  -Joao

On 07/08/13 15:14, Jeppesen, Nelson wrote:
Joao,

Have you had a chance to look at my monitor issues? I Ran ''ceph-mon -i FOO -compact'  last week but it did not improve disk usage.

Let me know if there's anything else I dig up. The monitor still at 0.67-rc2 with the OSDs at .0.61.7.

On 08/02/2013 12:15 AM, Jeppesen, Nelson wrote:
Thanks for the reply, but how can I fix this without an outage?

I tired adding 'mon compact on start = true' but the monitor just hung. Unfortunately this is a production cluster and can't take the outages (I'm assuming the cluster will fail without a monitor). I had three monitors I was hit with the store.db bug and lost two of the three.

I have tried running with 0.61.5, .0.61.7 and 0.67-rc2. None of them seem to shrink the DB.

My guess is that the compaction policies we are enforcing won't cover
the portions of the store that haven't been compacted *prior* to the
upgrade.

Even today we still know of users with stores growing over dozens of
GBs, requiring occasional restarts to compact (which is far from an
acceptable fix).  Some of these stores can take several minutes to
compact when the monitors are restarted, although these guys can often
mitigate any down time by restarting monitors one at a time while
maintaining quorum.  Unfortunately you don't have that luxury. :-\

If however you are willing to manually force a compaction, you should be
able to do so with 'ceph-mon -i FOO --compact'.

Now, there is a possibility this is why you've been unable to add other
monitors to the cluster.  Chances are that the iterators used to
synchronize the store get stuck, or move slowly enough to make all sorts
of funny timeouts to be triggered.

I intend to look into your issue (especially the problems with adding
new monitors) in the morning to better assess what's happening.

    -Joao

-----Original Message-----
From: Mike Dawson [mailto:mike.dawson at cloudapt.com]
Sent: Thursday, August 01, 2013 4:10 PM
To: Jeppesen, Nelson
Cc: ceph-users at lists.ceph.com
Subject: Re:  Why is my mon store.db is 220GB?

220GB is way, way too big. I suspect your monitors need to go through a successful leveldb compaction. The early releases of Cuttlefish suffered several issues with store.db growing unbounded. Most were fixed by 0.61.5, I believe.

You may have luck stoping all Ceph daemons, then starting the monitor by itself. When there were bugs, leveldb compaction tended work better without OSD traffic hitting the monitors. Also, there are some settings to force a compact on startup like 'mon compact on start = true' and mon compact on trim = true". I don't think either are required anymore though. See some history here:

http://tracker.ceph.com/issues/4895

Thanks,

Mike Dawson
Co-Founder & Director of Cloud Architecture Cloudapt LLC
6330 East 75th Street, Suite 170
Indianapolis, IN 46250

On 8/1/2013 6:52 PM, Jeppesen, Nelson wrote:
My Mon store.db has been at 220GB for a few months now. Why is this
and how can I fix it? I have one monitor in this cluster and I suspect
that I can't  add monitors to the cluster because it is too big. Thank you.

_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com