Re: [Octopus] Beware the on-disk conversion

Igor Fedotov <ifedotov@xxxxxxx> · Fri, 3 Apr 2020 13:50:02 +0300

Thanks, Jack.

One more question please - what's the actual maximum memory consumption 
for this specific OSD during fsck?

And is it backed by 3, 6 or 10 TB  drive ?

Regards,

Igor

On 4/2/2020 7:15 PM, Jack wrote:
I do compress:
root@backup2:~# ceph daemon osd.0 config show | grep bluestore_compression
     "bluestore_compression_algorithm": "snappy",
     "bluestore_compression_max_blob_size": "0",
     "bluestore_compression_max_blob_size_hdd": "524288",
     "bluestore_compression_max_blob_size_ssd": "65536",
     "bluestore_compression_min_blob_size": "0",
     "bluestore_compression_min_blob_size_hdd": "8192",
     "bluestore_compression_min_blob_size_ssd": "8192",
     "bluestore_compression_mode": "force",
     "bluestore_compression_required_ratio": "0.955000",

I will deal with the memory consumption
After all, it just require more time (starting OSD one by one), and it
still fits in my main memory

Thank you for checkout out the issue

On 4/2/20 5:28 PM, Igor Fedotov wrote:
So this OSD has 32M of shared blobs and fsck loads them all into memory
while processing. Hence the RAM consumption.

I'm afraid there is no simple way to fix that, will create a ticket though.

And a side question:

1) Do you use erasure coding and/or compression for rbd pool?

These stats look suspicious

POOL                        ID  STORED   (DATA)   (OMAP)   OBJECTS
USED     (DATA)   (OMAP)   %USED  MAX AVAIL  QUOTA OBJECTS  QUOTA BYTES
DIRTY    USED COMPR  UNDER COMPR
rbd                          1  245 TiB  245 TiB  9.0 MiB   50.26M 151
TiB  151 TiB  9.0 MiB  90.03     12 TiB  N/A N/A           50.26M
35 TiB      144 TiB

Stored - 245 TiB, Used - 151 TiB

Can't imagine any explanation other than applied compression.

Thanks,

Igor

On 4/2/2020 5:59 PM, Jack wrote:
Here it is

On 4/2/20 3:48 PM, Igor Fedotov wrote:
And may I have the output for:

ceph daemon osd.N calc_objectstore_db_histogram

This will collect some stats on record types in OSD's DB.

On 4/2/2020 4:13 PM, Jack wrote:
(fsck / quick-fix, same story)

On 4/2/20 3:12 PM, Jack wrote:
Hi,

A simple fsck eats the same amount of memory

Cluster usage: rbd with a bit of rgw

Here is the ceph df detail
All OSDs are single rusty devices

On 4/2/20 2:19 PM, Igor Fedotov wrote:
Hi Jack,

could you please try the following - stop one of already converted
OSDs
and do a quick-fix/fsck/repair against it using ceph_bluestore_tool:

ceph-bluestore-tool --path <path to osd> --command
quick-fix|fsck|repair

Does it cause similar memory usage?

You can stop experimenting if quick-fix reproduces the issue.

Also could you please describe your cluster and its usage a bit:
what's
the usage: rgw/rbd/cephfs? If possible - please share 'ceph df
detail'
output, do you have standalone DB volume at SSD/NVMe?

Thanks,

Igor

On 4/1/2020 6:28 PM, Jack wrote:
Hi,

As the upgrade documentation tells:
Note that the first time each OSD starts, it will do a format
conversion to improve the accounting for “omap” data. This may
take a few minutes to as much as a few hours (for an HDD with lots
of omap data). You can disable this automatic conversion with:
What the documentation does not say is that this process takes a
lot of
memory

I am upgrading a rusty cluster from Nautilus, you can check out the
ram
consumption as attachment

First, we have a 3TB osd conversion: it tooks ~15min, and 19GB of
memory

Then, we have a larger 6TB osd conversion: it tooks more than 2
hours,
and 35GB of memory

Finally, you have the largest 10TB osd: only 1H15, but 52GB of
memory

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx