Hi Chris,
Unfortunately "bluefs stats" is of a little help so far. It's not that
verbose when single disk per osd is in use. :(
Instead it would be nice to get the ouput for
'ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-<id> --command
bluefs-log-dump' command. To be executed against an offline OSD.
Inspecting osd.1 is not that helpful either. You better get all the
information (startup osd log,"kvstore-tool stats and bluestore-tool
output etc) from a different OSD which keeps large metadata.
W.r.t.sending logs over a cloud storage - could you please use
transfer.sh instead of WeTransfer?
Thanks,
Igor
On 3/22/2022 4:43 PM, Chris Page wrote:
Hi Igor,
Thanks for your email and your assistance.
> And IIUC you've got custom rocksdb settings, right? What's the
rationale for that? I would strongly discourage to alter them without
deep understanding of the consequences...
This was a recommended configuration which I must admit I didn't have
enough knowledge on to be applying.
- - - - - - - -
> Could you please share the output for the following command:
ceph tell osd.1 bluefs stats
1 : device size 0x37e3ec00000 : using 0x61230f9000(389 GiB)
wal_total:0, db_total:3648715856281, slow_total:0
- - - - - - - -
> Additionally you might want to share rocksdb stats, this to be
collected on an offline OSD:
ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-1 stats
I've attached bluestore-kv.txt
> Then please set debug-rocksdb & debug-bluestore to 10 and bring up
osd.1 again. Which apparently will need some time. What's in OSD log
then?
The log raced up to 285mb in a matter of a minute or so. Would you
like me to send this over a WeTransfer link? However the restart was
quick - most probably because OSD 1 was restarted last week and had
only generated 4G of metadata. Some of the OSD's seem to have
maintained a small-ish metadata size while others are back up at ~40GB
or larger (one is 185GB!)
> Once restarted - please collect a fresh report from 'bluefs stats'
command and share the results. It appears I'm getting the same output,
although the size has dropped by 4GB (the meta was only at 4G when I
restarted)
1 : device size 0x37e3ec00000 : using 0x6033ade000(385 GiB)
wal_total:0, db_total:3648715856281, slow_total:0
On Mon, 21 Mar 2022 at 13:11, Igor Fedotov <igor.fedotov@xxxxxxxx> wrote:
Hi Chris,
Such meta growth is completely unexpected to me.
And IIUC you've got custom rocksdb settings, right? What's the
rationale for that? I would strongly discourage to alter them
without deep understanding of the consequences...
My current working hypothesis is that DB compaction is not
performed properly during regular operation and is postponed till
OSD restart. Let's try to confirm that.
Could you please share the output for the following command:
ceph tell osd.1 bluefs stats
Additionally you might want to share rocksdb stats, this to be
collected on an offline OSD:
ceph-kvstore-tool bluestore_kv /var/lib/ceph/osd/ceph-1 stats
Then please set debug-rocksdb & debug-bluestore to 10 and bring up
osd.1 again. Which apparently will need some time. What's in OSD
log then?
Once restarted - please collect a fresh report from 'bluefs stats'
command and share the results.
And finally I would suggest to leave other OSDs (as well as
rocksdb settings) intact for a while to be able to troubleshoot
the issue to the end..
Thanks,
Igor
On 3/18/2022 5:38 PM, Chris Page wrote:
This certainly seems to be the case as running a manual
compaction and restarting works.
And `ceph tell osd.0 compact` reduces metadata consumption from
~160GB of metadata (for 380GB worth of data) to just 750MB. Below
is a snippet of my osd stats -
image.png
OSD Is this expected behaviour or is my metadata growing
abnormally? OSD's 1, 4 & 11 haven't been restarted in a couple of
weeks.
Here's my rocksdb settings -
bluestore_rocksdb_options =
compression=kNoCompression,max_write_buffer_number=32,min_write_buffer_number_to_merge=2,recycle_log_file_num=32,write_buffer_size=64M,compaction_readahead_size=2M
I hope you can help with this one - I'm at a bit of a loss!
Thanks,
Chris.
On Fri, 18 Mar 2022 at 14:25, Chris Page <sirhc.page@xxxxxxxxx>
wrote:
Hi,
Following up from this, is it just normal for them to take a
while? I notice that once I have restarted an OSD, the 'meta'
value drops right down to empty and slowly builds back up.
The restarted OSD's start with just 1gb or so of metadata and
increase over time to 160/170GB of metadata.
So perhaps the delay is just the rebuilding of this metadata
pool?
Thanks,
Chris.
--
Igor Fedotov
Ceph Lead Developer
Looking for help with your Ceph cluster? Contact us athttps://croit.io
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web:https://croit.io | YouTube:https://goo.gl/PGE1Bx
--
Igor Fedotov
Ceph Lead Developer
Looking for help with your Ceph cluster? Contact us athttps://croit.io
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web:https://croit.io | YouTube:https://goo.gl/PGE1Bx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx