On 1/21/2015 5:56 PM, Gregory Farnum wrote:
On Mon, Jan 19, 2015 at 2:48 PM, Brian Rak <brak@xxxxxxxxxxxxxxx> wrote:
Awhile ago, I ran into this issue: http://tracker.ceph.com/issues/10411
I did manage to solve that by deleting the PGs, however ever since that
issue my mon databases have been growing indefinitely. At the moment, I'm
up to 3404 sst files, totaling 7.4GB of space.
This appears to be causing a significant performance hit to all cluster
operations.
How can I get Ceph to clean up these files? I've tried 'ceph tell mon.X
compact', which had no effect (well, it updated the modification time on a
lot of files, but they're all still there). I don't see any other obvious
commands that would help.
I tried running 'ceph-monstore-tool --mon-store-path . --command dump-keys >
keys' (I have no idea if this is even the right direction), but it
segfaults:
# ceph-monstore-tool --mon-store-path . --command dump-keys > keys
./mon/MonitorDBStore.h: In function 'MonitorDBStore::~MonitorDBStore()'
thread 7fbea24b2760 time 2015-01-19 17:45:52.015742
./mon/MonitorDBStore.h: 630: FAILED assert(!is_open)
ceph version 0.87-73-gabdbbd6 (abdbbd6e846727385cf0a1412393bc9759dc0244)
1: (MonitorDBStore::~MonitorDBStore()+0x88) [0x4bf3c8]
2: (main()+0xdba) [0x4bbe2a]
3: (__libc_start_main()+0xfd) [0x3efc21ed5d]
4: ceph-monstore-tool() [0x4bad39]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
2015-01-19 17:45:52.015987 7fbea24b2760 -1 ./mon/MonitorDBStore.h: In
function 'MonitorDBStore::~MonitorDBStore()' thread 7fbea24b2760 time
2015-01-19 17:45:52.015742
./mon/MonitorDBStore.h: 630: FAILED assert(!is_open)
ceph version 0.87-73-gabdbbd6 (abdbbd6e846727385cf0a1412393bc9759dc0244)
1: (MonitorDBStore::~MonitorDBStore()+0x88) [0x4bf3c8]
2: (main()+0xdba) [0x4bbe2a]
3: (__libc_start_main()+0xfd) [0x3efc21ed5d]
4: ceph-monstore-tool() [0x4bad39]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
--- begin dump of recent events ---
-13> 2015-01-19 17:45:46.843470 7fbea24b2760 5 asok(0x3eb1ad0)
register_command perfcounters_dump hook 0x3eb1a80
-12> 2015-01-19 17:45:46.843483 7fbea24b2760 5 asok(0x3eb1ad0)
register_command 1 hook 0x3eb1a80
-11> 2015-01-19 17:45:46.843486 7fbea24b2760 5 asok(0x3eb1ad0)
register_command perf dump hook 0x3eb1a80
-10> 2015-01-19 17:45:46.843491 7fbea24b2760 5 asok(0x3eb1ad0)
register_command perfcounters_schema hook 0x3eb1a80
-9> 2015-01-19 17:45:46.843494 7fbea24b2760 5 asok(0x3eb1ad0)
register_command 2 hook 0x3eb1a80
-8> 2015-01-19 17:45:46.843496 7fbea24b2760 5 asok(0x3eb1ad0)
register_command perf schema hook 0x3eb1a80
-7> 2015-01-19 17:45:46.843498 7fbea24b2760 5 asok(0x3eb1ad0)
register_command config show hook 0x3eb1a80
-6> 2015-01-19 17:45:46.843501 7fbea24b2760 5 asok(0x3eb1ad0)
register_command config set hook 0x3eb1a80
-5> 2015-01-19 17:45:46.843505 7fbea24b2760 5 asok(0x3eb1ad0)
register_command config get hook 0x3eb1a80
-4> 2015-01-19 17:45:46.843508 7fbea24b2760 5 asok(0x3eb1ad0)
register_command config diff hook 0x3eb1a80
-3> 2015-01-19 17:45:46.843510 7fbea24b2760 5 asok(0x3eb1ad0)
register_command log flush hook 0x3eb1a80
-2> 2015-01-19 17:45:46.843514 7fbea24b2760 5 asok(0x3eb1ad0)
register_command log dump hook 0x3eb1a80
-1> 2015-01-19 17:45:46.843516 7fbea24b2760 5 asok(0x3eb1ad0)
register_command log reopen hook 0x3eb1a80
0> 2015-01-19 17:45:52.015987 7fbea24b2760 -1 ./mon/MonitorDBStore.h:
In function 'MonitorDBStore::~MonitorDBStore()'
It did dump some data (it crashed while printing out pgmap_pg entries)..
this is a summary of what's in there:
# cat keys | awk '{print $1}' | sort | uniq -c
173 auth
1351 logm
3 mdsmap
1 mkfs
6 monitor
22 monmap
1 mon_sync
95521 osdmap
105 osd_metadata
595 paxos
534 pgmap
6 pgmap_meta
105 pgmap_osd
13121 pgmap_pg
You appear to have 95000 untrimmed osdmaps, which would be...a lot.
That's probably the cause of your store growth.
These should be trimmed (automatically, of course) as long as the
cluster is clean; if it's not you should get it healthy and if it is
then there's a bug in the monitor.
-Greg
The cluster had been clean for awhile when I tried that. Oddly, I found
that running compact again against all the mons fixed the issue. I'm
not sure why I had to run it multiple times, but this seems to be resolved.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com