> Op 3 november 2016 om 10:46 schreef Wido den Hollander <wido@xxxxxxxx>: > > > > > Op 3 november 2016 om 10:42 schreef Dan van der Ster <dan@xxxxxxxxxxxxxx>: > > > > > > Hi Wido, > > > > AFAIK mon's won't trim while a cluster is in HEALTH_WARN. Unset > > noscrub,nodeep-scrub, get that 3rd mon up, then it should trim. > > > > The 3rd MON is back, but afaik the MONs trim when all PGs are active+clean. A cluster can go into WARN state for almost any reason, eg old CRUSH tunables. > > Will give it a try though. No, it didn't. Health is OK, but still, the MON stores will not trim. A manual compaction actually grew the store from 25GB to 39GB. They keep having a high amount of Paxos keys in the MON stores. Wido > > Wido > > > -- Dan > > > > > > On Thu, Nov 3, 2016 at 10:40 AM, Wido den Hollander <wido@xxxxxxxx> wrote: > > > Hi, > > > > > > After finally resolving the remapped PGs [0] I'm running into a a problem where the MON stores are not trimming. > > > > > > health HEALTH_WARN > > > noscrub,nodeep-scrub flag(s) set > > > 1 mons down, quorum 0,1 1,2 > > > mon.1 store is getting too big! 37115 MB >= 15360 MB > > > mon.2 store is getting too big! 26327 MB >= 15360 MB > > > > > > At first I thought it was due to the remapped PGs and the cluster not being active+clean, but after this was resolved the stores wouldn't trim. Not even when a compact was forced. > > > > > > I tried to force a sync of one of the MONs, that works, but it seems that the Paxos entries are not trimmed from the store. > > > > > > A snippet of log from the Mon which is syncing: > > > > > > 2016-11-03 10:18:05.354643 7f6f90988700 10 mon.3@2(synchronizing) e1 sync_reset_timeout > > > 2016-11-03 10:18:05.368222 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync mon_sync(chunk cookie 3288334339 lc 174329061 bl 448496 bytes last_key paxos,13242098) v2 > > > 2016-11-03 10:18:05.368229 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync_chunk mon_sync(chunk cookie 3288334339 lc 174329061 bl 448496 bytes last_key paxos,13242098) v2 > > > 2016-11-03 10:18:05.379160 7f6f90988700 10 mon.3@2(synchronizing) e1 sync_reset_timeout > > > 2016-11-03 10:18:05.387253 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync mon_sync(chunk cookie 3288334339 lc 174329061 bl 2512885 bytes last_key paxos,13242099) v2 > > > 2016-11-03 10:18:05.387260 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync_chunk mon_sync(chunk cookie 3288334339 lc 174329061 bl 2512885 bytes last_key paxos,13242099) v2 > > > 2016-11-03 10:18:05.409084 7f6f90988700 10 mon.3@2(synchronizing) e1 sync_reset_timeout > > > 2016-11-03 10:18:05.424569 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync mon_sync(chunk cookie 3288334339 lc 174329061 bl 804569 bytes last_key paxos,13242142) v2 > > > 2016-11-03 10:18:05.424576 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync_chunk mon_sync(chunk cookie 3288334339 lc 174329061 bl 804569 bytes last_key paxos,13242142) v2 > > > 2016-11-03 10:18:05.435102 7f6f90988700 10 mon.3@2(synchronizing) e1 sync_reset_timeout > > > 2016-11-03 10:18:05.442261 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync mon_sync(chunk cookie 3288334339 lc 174329061 bl 2522418 bytes last_key paxos,13242143) v2 > > > 2016-11-03 10:18:05.442270 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync_chunk mon_sync(chunk cookie 3288334339 lc 174329061 bl 2522418 bytes last_key paxos,13242143) v2 > > > > > > In the tracker [1] I found a issue which looks like it, but that issue was resolved over 3 years ago. > > > > > > Looking at mon.1 for example: > > > > > > root@mon1:/var/lib/ceph/mon/ceph-mon1/store.db# ls|wc -l > > > 12769 > > > root@mon1:/var/lib/ceph/mon/ceph-mon1/store.db# du -sh . > > > 37G . > > > root@mon1:/var/lib/ceph/mon/ceph-mon1/store.db# > > > > > > To clarify, these Monitors already had their big data store under Dumpling and were recently upgraded to Firefly and Hammer. > > > > > > All PGs are active+clean at the moment, but it seems that the MON stores mainly contain the Paxos entries which are not trimmed. > > > > > > root@mon3:/var/lib/ceph/mon# ceph-monstore-tool ceph-mon3 dump-keys|awk '{print $1}'|uniq -c > > > 96 auth > > > 1143 logm > > > 3 mdsmap > > > 1 mkfs > > > 1 mon_sync > > > 6 monitor > > > 3 monmap > > > 1158 osdmap > > > 358364 paxos > > > 656 pgmap > > > 6 pgmap_meta > > > 168 pgmap_osd > > > 6144 pgmap_pg > > > root@mon3:/var/lib/ceph/mon# > > > > > > So there are 358k Paxos entries in the Mon store. > > > > > > Any suggestions on how to trim those from the MON store(s)? > > > > > > Wido > > > > > > > > > [0]: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-November/014113.html > > > [1]: http://tracker.ceph.com/issues/4895 > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users@xxxxxxxxxxxxxx > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com