On Mon, Jan 19, 2015 at 2:48 PM, Brian Rak <brak@xxxxxxxxxxxxxxx> wrote: > Awhile ago, I ran into this issue: http://tracker.ceph.com/issues/10411 > > I did manage to solve that by deleting the PGs, however ever since that > issue my mon databases have been growing indefinitely. At the moment, I'm > up to 3404 sst files, totaling 7.4GB of space. > > This appears to be causing a significant performance hit to all cluster > operations. > > How can I get Ceph to clean up these files? I've tried 'ceph tell mon.X > compact', which had no effect (well, it updated the modification time on a > lot of files, but they're all still there). I don't see any other obvious > commands that would help. > > I tried running 'ceph-monstore-tool --mon-store-path . --command dump-keys > > keys' (I have no idea if this is even the right direction), but it > segfaults: > > # ceph-monstore-tool --mon-store-path . --command dump-keys > keys > ./mon/MonitorDBStore.h: In function 'MonitorDBStore::~MonitorDBStore()' > thread 7fbea24b2760 time 2015-01-19 17:45:52.015742 > ./mon/MonitorDBStore.h: 630: FAILED assert(!is_open) > ceph version 0.87-73-gabdbbd6 (abdbbd6e846727385cf0a1412393bc9759dc0244) > 1: (MonitorDBStore::~MonitorDBStore()+0x88) [0x4bf3c8] > 2: (main()+0xdba) [0x4bbe2a] > 3: (__libc_start_main()+0xfd) [0x3efc21ed5d] > 4: ceph-monstore-tool() [0x4bad39] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this. > 2015-01-19 17:45:52.015987 7fbea24b2760 -1 ./mon/MonitorDBStore.h: In > function 'MonitorDBStore::~MonitorDBStore()' thread 7fbea24b2760 time > 2015-01-19 17:45:52.015742 > ./mon/MonitorDBStore.h: 630: FAILED assert(!is_open) > > ceph version 0.87-73-gabdbbd6 (abdbbd6e846727385cf0a1412393bc9759dc0244) > 1: (MonitorDBStore::~MonitorDBStore()+0x88) [0x4bf3c8] > 2: (main()+0xdba) [0x4bbe2a] > 3: (__libc_start_main()+0xfd) [0x3efc21ed5d] > 4: ceph-monstore-tool() [0x4bad39] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this. > > --- begin dump of recent events --- > -13> 2015-01-19 17:45:46.843470 7fbea24b2760 5 asok(0x3eb1ad0) > register_command perfcounters_dump hook 0x3eb1a80 > -12> 2015-01-19 17:45:46.843483 7fbea24b2760 5 asok(0x3eb1ad0) > register_command 1 hook 0x3eb1a80 > -11> 2015-01-19 17:45:46.843486 7fbea24b2760 5 asok(0x3eb1ad0) > register_command perf dump hook 0x3eb1a80 > -10> 2015-01-19 17:45:46.843491 7fbea24b2760 5 asok(0x3eb1ad0) > register_command perfcounters_schema hook 0x3eb1a80 > -9> 2015-01-19 17:45:46.843494 7fbea24b2760 5 asok(0x3eb1ad0) > register_command 2 hook 0x3eb1a80 > -8> 2015-01-19 17:45:46.843496 7fbea24b2760 5 asok(0x3eb1ad0) > register_command perf schema hook 0x3eb1a80 > -7> 2015-01-19 17:45:46.843498 7fbea24b2760 5 asok(0x3eb1ad0) > register_command config show hook 0x3eb1a80 > -6> 2015-01-19 17:45:46.843501 7fbea24b2760 5 asok(0x3eb1ad0) > register_command config set hook 0x3eb1a80 > -5> 2015-01-19 17:45:46.843505 7fbea24b2760 5 asok(0x3eb1ad0) > register_command config get hook 0x3eb1a80 > -4> 2015-01-19 17:45:46.843508 7fbea24b2760 5 asok(0x3eb1ad0) > register_command config diff hook 0x3eb1a80 > -3> 2015-01-19 17:45:46.843510 7fbea24b2760 5 asok(0x3eb1ad0) > register_command log flush hook 0x3eb1a80 > -2> 2015-01-19 17:45:46.843514 7fbea24b2760 5 asok(0x3eb1ad0) > register_command log dump hook 0x3eb1a80 > -1> 2015-01-19 17:45:46.843516 7fbea24b2760 5 asok(0x3eb1ad0) > register_command log reopen hook 0x3eb1a80 > 0> 2015-01-19 17:45:52.015987 7fbea24b2760 -1 ./mon/MonitorDBStore.h: > In function 'MonitorDBStore::~MonitorDBStore()' > > It did dump some data (it crashed while printing out pgmap_pg entries).. > this is a summary of what's in there: > > # cat keys | awk '{print $1}' | sort | uniq -c > 173 auth > 1351 logm > 3 mdsmap > 1 mkfs > 6 monitor > 22 monmap > 1 mon_sync > 95521 osdmap > 105 osd_metadata > 595 paxos > 534 pgmap > 6 pgmap_meta > 105 pgmap_osd > 13121 pgmap_pg You appear to have 95000 untrimmed osdmaps, which would be...a lot. That's probably the cause of your store growth. These should be trimmed (automatically, of course) as long as the cluster is clean; if it's not you should get it healthy and if it is then there's a bug in the monitor. -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com