Monitors stores not trimming after upgrade from Dumpling to Hammer

Wido den Hollander <wido@xxxxxxxx> · Thu, 3 Nov 2016 10:40:13 +0100 (CET)

Hi,

After finally resolving the remapped PGs [0] I'm running into a a problem where the MON stores are not trimming.

     health HEALTH_WARN
            noscrub,nodeep-scrub flag(s) set
            1 mons down, quorum 0,1 1,2
            mon.1 store is getting too big! 37115 MB >= 15360 MB
            mon.2 store is getting too big! 26327 MB >= 15360 MB

At first I thought it was due to the remapped PGs and the cluster not being active+clean, but after this was resolved the stores wouldn't trim. Not even when a compact was forced.

I tried to force a sync of one of the MONs, that works, but it seems that the Paxos entries are not trimmed from the store.

A snippet of log from the Mon which is syncing:

2016-11-03 10:18:05.354643 7f6f90988700 10 mon.3@2(synchronizing) e1 sync_reset_timeout
2016-11-03 10:18:05.368222 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync mon_sync(chunk cookie 3288334339 lc 174329061 bl 448496 bytes last_key paxos,13242098) v2
2016-11-03 10:18:05.368229 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync_chunk mon_sync(chunk cookie 3288334339 lc 174329061 bl 448496 bytes last_key paxos,13242098) v2
2016-11-03 10:18:05.379160 7f6f90988700 10 mon.3@2(synchronizing) e1 sync_reset_timeout
2016-11-03 10:18:05.387253 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync mon_sync(chunk cookie 3288334339 lc 174329061 bl 2512885 bytes last_key paxos,13242099) v2
2016-11-03 10:18:05.387260 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync_chunk mon_sync(chunk cookie 3288334339 lc 174329061 bl 2512885 bytes last_key paxos,13242099) v2
2016-11-03 10:18:05.409084 7f6f90988700 10 mon.3@2(synchronizing) e1 sync_reset_timeout
2016-11-03 10:18:05.424569 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync mon_sync(chunk cookie 3288334339 lc 174329061 bl 804569 bytes last_key paxos,13242142) v2
2016-11-03 10:18:05.424576 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync_chunk mon_sync(chunk cookie 3288334339 lc 174329061 bl 804569 bytes last_key paxos,13242142) v2
2016-11-03 10:18:05.435102 7f6f90988700 10 mon.3@2(synchronizing) e1 sync_reset_timeout
2016-11-03 10:18:05.442261 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync mon_sync(chunk cookie 3288334339 lc 174329061 bl 2522418 bytes last_key paxos,13242143) v2
2016-11-03 10:18:05.442270 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync_chunk mon_sync(chunk cookie 3288334339 lc 174329061 bl 2522418 bytes last_key paxos,13242143) v2

In the tracker [1] I found a issue which looks like it, but that issue was resolved over 3 years ago.

Looking at mon.1 for example:

root@mon1:/var/lib/ceph/mon/ceph-mon1/store.db# ls|wc -l
12769
root@mon1:/var/lib/ceph/mon/ceph-mon1/store.db# du -sh .
37G	.
root@mon1:/var/lib/ceph/mon/ceph-mon1/store.db#

To clarify, these Monitors already had their big data store under Dumpling and were recently upgraded to Firefly and Hammer.

All PGs are active+clean at the moment, but it seems that the MON stores mainly contain the Paxos entries which are not trimmed.

root@mon3:/var/lib/ceph/mon# ceph-monstore-tool ceph-mon3 dump-keys|awk '{print $1}'|uniq -c
     96 auth
   1143 logm
      3 mdsmap
      1 mkfs
      1 mon_sync
      6 monitor
      3 monmap
   1158 osdmap
 358364 paxos
    656 pgmap
      6 pgmap_meta
    168 pgmap_osd
   6144 pgmap_pg
root@mon3:/var/lib/ceph/mon#

So there are 358k Paxos entries in the Mon store.

Any suggestions on how to trim those from the MON store(s)?

Wido

[0]: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-November/014113.html
[1]: http://tracker.ceph.com/issues/4895
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com