Hi, After finally resolving the remapped PGs [0] I'm running into a a problem where the MON stores are not trimming. health HEALTH_WARN noscrub,nodeep-scrub flag(s) set 1 mons down, quorum 0,1 1,2 mon.1 store is getting too big! 37115 MB >= 15360 MB mon.2 store is getting too big! 26327 MB >= 15360 MB At first I thought it was due to the remapped PGs and the cluster not being active+clean, but after this was resolved the stores wouldn't trim. Not even when a compact was forced. I tried to force a sync of one of the MONs, that works, but it seems that the Paxos entries are not trimmed from the store. A snippet of log from the Mon which is syncing: 2016-11-03 10:18:05.354643 7f6f90988700 10 mon.3@2(synchronizing) e1 sync_reset_timeout 2016-11-03 10:18:05.368222 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync mon_sync(chunk cookie 3288334339 lc 174329061 bl 448496 bytes last_key paxos,13242098) v2 2016-11-03 10:18:05.368229 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync_chunk mon_sync(chunk cookie 3288334339 lc 174329061 bl 448496 bytes last_key paxos,13242098) v2 2016-11-03 10:18:05.379160 7f6f90988700 10 mon.3@2(synchronizing) e1 sync_reset_timeout 2016-11-03 10:18:05.387253 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync mon_sync(chunk cookie 3288334339 lc 174329061 bl 2512885 bytes last_key paxos,13242099) v2 2016-11-03 10:18:05.387260 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync_chunk mon_sync(chunk cookie 3288334339 lc 174329061 bl 2512885 bytes last_key paxos,13242099) v2 2016-11-03 10:18:05.409084 7f6f90988700 10 mon.3@2(synchronizing) e1 sync_reset_timeout 2016-11-03 10:18:05.424569 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync mon_sync(chunk cookie 3288334339 lc 174329061 bl 804569 bytes last_key paxos,13242142) v2 2016-11-03 10:18:05.424576 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync_chunk mon_sync(chunk cookie 3288334339 lc 174329061 bl 804569 bytes last_key paxos,13242142) v2 2016-11-03 10:18:05.435102 7f6f90988700 10 mon.3@2(synchronizing) e1 sync_reset_timeout 2016-11-03 10:18:05.442261 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync mon_sync(chunk cookie 3288334339 lc 174329061 bl 2522418 bytes last_key paxos,13242143) v2 2016-11-03 10:18:05.442270 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync_chunk mon_sync(chunk cookie 3288334339 lc 174329061 bl 2522418 bytes last_key paxos,13242143) v2 In the tracker [1] I found a issue which looks like it, but that issue was resolved over 3 years ago. Looking at mon.1 for example: root@mon1:/var/lib/ceph/mon/ceph-mon1/store.db# ls|wc -l 12769 root@mon1:/var/lib/ceph/mon/ceph-mon1/store.db# du -sh . 37G . root@mon1:/var/lib/ceph/mon/ceph-mon1/store.db# To clarify, these Monitors already had their big data store under Dumpling and were recently upgraded to Firefly and Hammer. All PGs are active+clean at the moment, but it seems that the MON stores mainly contain the Paxos entries which are not trimmed. root@mon3:/var/lib/ceph/mon# ceph-monstore-tool ceph-mon3 dump-keys|awk '{print $1}'|uniq -c 96 auth 1143 logm 3 mdsmap 1 mkfs 1 mon_sync 6 monitor 3 monmap 1158 osdmap 358364 paxos 656 pgmap 6 pgmap_meta 168 pgmap_osd 6144 pgmap_pg root@mon3:/var/lib/ceph/mon# So there are 358k Paxos entries in the Mon store. Any suggestions on how to trim those from the MON store(s)? Wido [0]: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-November/014113.html [1]: http://tracker.ceph.com/issues/4895 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com