Re: Monitors stores not trimming after upgrade from Dumpling to Hammer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Op 3 november 2016 om 10:42 schreef Dan van der Ster <dan@xxxxxxxxxxxxxx>:
> 
> 
> Hi Wido,
> 
> AFAIK mon's won't trim while a cluster is in HEALTH_WARN. Unset
> noscrub,nodeep-scrub, get that 3rd mon up, then it should trim.
> 

The 3rd MON is back, but afaik the MONs trim when all PGs are active+clean. A cluster can go into WARN state for almost any reason, eg old CRUSH tunables.

Will give it a try though.

Wido

> -- Dan
> 
> 
> On Thu, Nov 3, 2016 at 10:40 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
> > Hi,
> >
> > After finally resolving the remapped PGs [0] I'm running into a a problem where the MON stores are not trimming.
> >
> >      health HEALTH_WARN
> >             noscrub,nodeep-scrub flag(s) set
> >             1 mons down, quorum 0,1 1,2
> >             mon.1 store is getting too big! 37115 MB >= 15360 MB
> >             mon.2 store is getting too big! 26327 MB >= 15360 MB
> >
> > At first I thought it was due to the remapped PGs and the cluster not being active+clean, but after this was resolved the stores wouldn't trim. Not even when a compact was forced.
> >
> > I tried to force a sync of one of the MONs, that works, but it seems that the Paxos entries are not trimmed from the store.
> >
> > A snippet of log from the Mon which is syncing:
> >
> > 2016-11-03 10:18:05.354643 7f6f90988700 10 mon.3@2(synchronizing) e1 sync_reset_timeout
> > 2016-11-03 10:18:05.368222 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync mon_sync(chunk cookie 3288334339 lc 174329061 bl 448496 bytes last_key paxos,13242098) v2
> > 2016-11-03 10:18:05.368229 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync_chunk mon_sync(chunk cookie 3288334339 lc 174329061 bl 448496 bytes last_key paxos,13242098) v2
> > 2016-11-03 10:18:05.379160 7f6f90988700 10 mon.3@2(synchronizing) e1 sync_reset_timeout
> > 2016-11-03 10:18:05.387253 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync mon_sync(chunk cookie 3288334339 lc 174329061 bl 2512885 bytes last_key paxos,13242099) v2
> > 2016-11-03 10:18:05.387260 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync_chunk mon_sync(chunk cookie 3288334339 lc 174329061 bl 2512885 bytes last_key paxos,13242099) v2
> > 2016-11-03 10:18:05.409084 7f6f90988700 10 mon.3@2(synchronizing) e1 sync_reset_timeout
> > 2016-11-03 10:18:05.424569 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync mon_sync(chunk cookie 3288334339 lc 174329061 bl 804569 bytes last_key paxos,13242142) v2
> > 2016-11-03 10:18:05.424576 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync_chunk mon_sync(chunk cookie 3288334339 lc 174329061 bl 804569 bytes last_key paxos,13242142) v2
> > 2016-11-03 10:18:05.435102 7f6f90988700 10 mon.3@2(synchronizing) e1 sync_reset_timeout
> > 2016-11-03 10:18:05.442261 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync mon_sync(chunk cookie 3288334339 lc 174329061 bl 2522418 bytes last_key paxos,13242143) v2
> > 2016-11-03 10:18:05.442270 7f6f90988700 10 mon.3@2(synchronizing) e1 handle_sync_chunk mon_sync(chunk cookie 3288334339 lc 174329061 bl 2522418 bytes last_key paxos,13242143) v2
> >
> > In the tracker [1] I found a issue which looks like it, but that issue was resolved over 3 years ago.
> >
> > Looking at mon.1 for example:
> >
> > root@mon1:/var/lib/ceph/mon/ceph-mon1/store.db# ls|wc -l
> > 12769
> > root@mon1:/var/lib/ceph/mon/ceph-mon1/store.db# du -sh .
> > 37G     .
> > root@mon1:/var/lib/ceph/mon/ceph-mon1/store.db#
> >
> > To clarify, these Monitors already had their big data store under Dumpling and were recently upgraded to Firefly and Hammer.
> >
> > All PGs are active+clean at the moment, but it seems that the MON stores mainly contain the Paxos entries which are not trimmed.
> >
> > root@mon3:/var/lib/ceph/mon# ceph-monstore-tool ceph-mon3 dump-keys|awk '{print $1}'|uniq -c
> >      96 auth
> >    1143 logm
> >       3 mdsmap
> >       1 mkfs
> >       1 mon_sync
> >       6 monitor
> >       3 monmap
> >    1158 osdmap
> >  358364 paxos
> >     656 pgmap
> >       6 pgmap_meta
> >     168 pgmap_osd
> >    6144 pgmap_pg
> > root@mon3:/var/lib/ceph/mon#
> >
> > So there are 358k Paxos entries in the Mon store.
> >
> > Any suggestions on how to trim those from the MON store(s)?
> >
> > Wido
> >
> >
> > [0]: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-November/014113.html
> > [1]: http://tracker.ceph.com/issues/4895
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux