Re: "store is getting too big" on monitors

Mohamed Pakkeer <mdfakkeer@xxxxxxxxx> · Tue, 17 Feb 2015 16:43:45 +0530

Hi Joao,
We followed your instruction to create the store dump

ceph-kvstore-tool /var/lib/ceph/mon/ceph-FOO/store.db list > store.dump'for above store's location, let's call it $STORE:

for m in osdmap pgmap; do
  for k in first_committed last_committed; do
    ceph-kvstore-tool $STORE get $m $k >> store.dump
  done
done

ceph-kvstore-tool $STORE get pgmap_meta last_osdmap_epoch >> store.dump
ceph-kvstore-tool $STORE get pgmap_meta version >> store.dump

Please find the store dump on the following link.

http://jmp.sh/LUh6iWo

-- 
Thanks & Regards    
K.Mohamed Pakkeer

On Mon, Feb 16, 2015 at 8:14 PM, Joao Eduardo Luis <joao@xxxxxxxxxx> wrote:
On 02/16/2015 12:57 PM, Mohamed Pakkeer wrote:

  Hi ceph-experts,

   We are getting "store is getting too big" on our test cluster.

Cluster is running with giant release and configured as EC pool to test

cephFS.

cluster c2a97a2f-fdc7-4eb5-82ef-70c52f2eceb1

      health HEALTH_WARN too few pgs per osd (0 < min 20); mon.master01

store is getting too big! 15376 MB >= 15360 MB; mon.master02 store is

getting too big! 15402 MB >= 15360 MB; mon.master03 store is getting too

big! 15402 MB >= 15360 MB; clock skew detected on mon.master02, mon.master03

      monmap e3: 3 mons at

{master01=10.1.2.231:6789/0,master02=10.1.2.232:6789/0,master03=10.1.2.233:6789/0

<http://10.1.2.231:6789/0,master02=10.1.2.232:6789/0,master03=10.1.2.233:6789/0>},

election epoch 38, quorum 0,1,2 master01,master02,master03

      osdmap e97396: 552 osds: 552 up, 552 in

       pgmap v354736: 0 pgs, 0 pools, 0 bytes data, 0 objects

             8547 GB used, 1953 TB / 1962 TB avail

We tried monitor restart with mon compact on start = true as well as

manual compaction using 'ceph tell mon.FOO compact'. But it didn't

reduce the size of store.db. We already deleted the pools and mds to

start fresh cluster. Do we need to delete the mon and recreate again or

do we have any solution to reduce the store size?

Could you get us a list of all the keys on the store using 'ceph-kvstore-tool' ?  Instructions on the email you quoted.

Cheers!

  -Joao

Regards,

K.Mohamed Pakkeer

On 12/10/2014 07:30 PM, Kevin Sumner wrote:

    The mons have grown another 30GB each overnight (except for 003?), which

    is quite worrying.  I ran a little bit of testing yesterday after my

    post, but not a significant amount.

    I wouldn’t expect compact on start to help this situation based on the

    name since we don’t (shouldn’t?) restart the mons regularly, but there

    appears to be no documentation on it.  We’re pretty good on disk space

    on the mons currently, but if that changes, I’ll probably use this to

    see about bringing these numbers in line.

This is an issue that has been seen on larger clusters, and it usually

takes a monitor restart, with 'mon compact on start = true' or manual

compaction 'ceph tell mon.FOO compact' to bring the monitor back to a

sane disk usage level.

However, I have not been able to reproduce this in order to track the

source. I'm guessing I lack the scale of the cluster, or the appropriate

workload (maybe both).

What kind of workload are you running the cluster through? You mention

cephfs, but do you have any more info you can share that could help us

reproducing this state?

Sage also fixed an issue that could potentially cause this (depending on

what is causing it in the first place) [1,2,3]. This bug, #9987, is due

to a given cached value not being updated, leading to the monitor not

removing unnecessary data, potentially causing this growth. This cached

value would be set to its proper value when the monitor is restarted

though, so a simple restart would have all this unnecessary data blown away.

Restarting the monitor ends up masking the true cause of the store

growth: whether from #9987 or from obsolete data kept by the monitor's

backing store (leveldb), either due to misuse of leveldb or due to

leveldb's nature (haven't been able to ascertain which may be at fault,

partly due to being unable to reproduce the problem).

If you are up to it, I would suggest the following approach in hope to

determine what may be at fault:

1) 'ceph tell mon.FOO compact' -- which will force the monitor to

compact its store. This won't close leveldb, so it won't have much

effect on the store size if it happens to be leveldb holding on to some

data (I could go into further detail, but I don't think this is the

right medium). 1.a) you may notice the store increasing in size during

this period; it's expected. 1.b) compaction may take a while, but in the

end you'll hopefully see a significant reduction in size.

2) Assuming that failed, I would suggest doing the following:

2.1) grab ceph-kvstore-tool from the ceph-test package

2.2) stop the monitor

2.3) run 'ceph-kvstore-tool /var/lib/ceph/mon/ceph-FOO/store.db list >

store.dump'

2.4) run (for above store's location, let's call it $STORE:

for m in osdmap pgmap; do

   for k in first_committed last_committed; do

     ceph-kvstore-tool $STORE get $m $k >> store.dump

   done

done

ceph-kvstore-tool $STORE get pgmap_meta last_osdmap_epoch >> store.dump

ceph-kvstore-tool $STORE get pgmap_meta version >> store.dump

2.5) send over the results of the dump

2.6) if you were to compress the store as well and send me a link to

grab it I would appreciate it.

3) Next you could simply restart the monitor (without 'mon compact on

start = true'); if the monitor's store size decreases, then there's a

fair chance that you've been bit by #9987. Otherwise, it may be

leveldb's clutter. You should also note that leveldb may itself compact

automatically on start, so it's hard to say for sure what fixed what.

4) If store size hasn't gone back to sane levels by now, you may wish to

restart with 'mon compact on start = true' and see if it helps. If it

doesn't, then we may have a completely different issue in our hands.

Now, assuming your store size went down on step 3, and if you are

willing, it would be interesting to see if Sage's patches helps out in

any way. The patches have not been backported to the giant branch yet,

so you would have to apply them yourself. For them to work you would

have to run the patched monitor as the leader. I would suggest leaving

the other monitors running an unpatched version so they could act as the

control group.

Let us know if any of this helps.

Cheers!

   -Joao

[1] -http://tracker.ceph.com/issues/9987

[2] - 093c5f0cabeb552b90d944da2c50de48fcf6f564

[3] - 3fb731b722c50672a5a9de0c86a621f5f50f2d06

    :: ~ » ceph health detail | grep 'too big'

    HEALTH_WARN mon.cluster4-monitor001 store is getting too big! 77365 MB

      >= 15360 MB; mon.cluster4-monitor002 store is getting too big! 87868 MB

      >= 15360 MB; mon.cluster4-monitor003 store is getting too big! 30359 MB

      >= 15360 MB; mon.cluster4-monitor004 store is getting too big! 93414 MB

      >= 15360 MB; mon.cluster4-monitor005 store is getting too big! 88232 MB

      >= 15360 MB

    mon.cluster4-monitor001 store is getting too big! 77365 MB >= 15360 MB

    -- 72% avail

    mon.cluster4-monitor002 store is getting too big! 87868 MB >= 15360 MB

    -- 70% avail

    mon.cluster4-monitor003 store is getting too big! 30359 MB >= 15360 MB

    -- 85% avail

    mon.cluster4-monitor004 store is getting too big! 93414 MB >= 15360 MB

    -- 69% avail

    mon.cluster4-monitor005 store is getting too big! 88232 MB >= 15360 MB

    -- 71% avail

    --

    Kevin Sumner

    ke...@xxxxxxxxx  <mailto:ke...@xxxxxxxxx>  <mailto:ke...@xxxxxxxxx>

        On Dec 9, 2014, at 6:20 PM, Haomai Wang <haomaiw...@xxxxxxxxx  <mailto:haomaiw...@xxxxxxxxx>

        <mailto:haomaiw...@xxxxxxxxx>> wrote:

        Maybe you can enable "mon_compact_on_start=true" when restarting mon,

        it will compact data

        On Wed, Dec 10, 2014 at 6:50 AM, Kevin Sumner <ke...@xxxxxxxxx  <mailto:ke...@xxxxxxxxx>

        <mailto:ke...@xxxxxxxxx>> wrote:

            Hi all,

            We recently upgraded our cluster to Giant from.  Since then, we’ve been

            driving load tests against CephFS.  However, we’re getting “store is

            getting

            too big” warnings from the monitors and the mons have started

            consuming way

            more disk space, 40GB-60GB now as opposed to ~10GB pre-upgrade.  Is this

            expected?  Is there anything I can do to ease the store’s size?

            Thanks!

            :: ~ » ceph status

                cluster f1aefa73-b968-41e0-9a28-9a465db5f10b

                 health HEALTH_WARN mon.cluster4-monitor001 store is getting too big!

            45648 MB >= 15360 MB; mon.cluster4-monitor002 store is getting too big!

            56939 MB >= 15360 MB; mon.cluster4-monitor003 store is getting too big!

            28647 MB >= 15360 MB; mon.cluster4-monitor004 store is getting too big!

            60655 MB >= 15360 MB; mon.cluster4-monitor005 store is getting too big!

            57335 MB >= 15360 MB

                 monmap e3: 5 mons at

            {cluster4-monitor001=17.138.96.12:6789/0,cluster4-monitor002=17.138.96.13:6789/0,cluster4-monitor003=17.138.96.14:6789/0,cluster4-monitor004=17.138.96.15:6789/0,cluster4-monitor005=17.138.96.16:6789/0  <http://17.138.96.12:6789/0,cluster4-monitor002=17.138.96.13:6789/0,cluster4-monitor003=17.138.96.14:6789/0,cluster4-monitor004=17.138.96.15:6789/0,cluster4-monitor005=17.138.96.16:6789/0>},

            election epoch 34938, quorum 0,1,2,3,4

            cluster4-monitor001,cluster4-monitor002,cluster4-monitor003,cluster4-monitor004,cluster4-monitor005

                 mdsmap e6538: 1/1/1 up {0=cluster4-monitor001=up:active}

                 osdmap e49500: 501 osds: 470 up, 469 in

                  pgmap v1369307: 98304 pgs, 3 pools, 4933 GB data, 1976 kobjects

                        16275 GB used, 72337 GB / 93366 GB avail

                           98304 active+clean

              client io 3463 MB/s rd, 18710 kB/s wr, 7456 op/s

            --

            Kevin Sumner

            ke...@xxxxxxxxx  <mailto:ke...@xxxxxxxxx>  <mailto:ke...@xxxxxxxxx>

            _______________________________________________

            ceph-users mailing list

            ceph-users@xxxxxxxxxxxxxx  <mailto:ceph-users@xxxxxxxxxx.com>

            http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

        --

        Best Regards,

        Wheat

    _______________________________________________

    ceph-users mailing list

    ceph-users@xxxxxxxxxxxxxx  <mailto:ceph-users@xxxxxxxxxx.com>

    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx  <mailto:ceph-users@xxxxxxxxxx.com>

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--

Thanks & Regards

K.Mohamed Pakkeer

Mobile- 0091-8754410114

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com