Re: ceph df (octopus) shows USED is 7 times higher than STORED in erasure coded pool

Wladimir Mutel <mwg@xxxxxxxxx> · Sun, 4 Jul 2021 15:52:02 +0300

Dear Mr Baergen,

thanks a lot for your very concise explanation,
however I would like to learn more why default Bluestore alloc.size causes such a big storage overhead,
and where in the Ceph docs it is explained how and what to watch for to avoid hitting this phenomenon again and again.
I have a feeling this is what I get on my experimental Ceph setup with simplest JErasure 2+1 data pool.
Could it be caused by journaled RBD writes to EC data-pool ?

Josh Baergen wrote:
Hey Arkadiy,

If the OSDs are on HDDs and were created with the default
bluestore_min_alloc_size_hdd, which is still 64KiB in Octopus, then in
effect data will be allocated from the pool in 640KiB chunks (64KiB *
(k+m)). 5.36M objects taking up 501GiB is an average object size of 98KiB
which results in a ratio of 6.53:1 allocated:stored, which is pretty close
to the 7:1 observed.

If my assumption about your configuration is correct, then the only way to
fix this is to adjust bluestore_min_alloc_size_hdd and recreate all your
OSDs, which will take a while...

Josh

On Tue, Jun 29, 2021 at 3:07 PM Arkadiy Kulev <eth@xxxxxxxxxxxx> wrote:

The pool *default.rgw.buckets.data* has *501 GiB* stored, but USED shows
*3.5
TiB *(7 times higher!)*:*

root@ceph-01:~# ceph df
--- RAW STORAGE ---
CLASS  SIZE     AVAIL    USED     RAW USED  %RAW USED
hdd    196 TiB  193 TiB  3.5 TiB   3.6 TiB       1.85
TOTAL  196 TiB  193 TiB  3.5 TiB   3.6 TiB       1.85

--- POOLS ---
POOL                       ID  PGS  STORED   OBJECTS  USED     %USED  MAX
AVAIL
device_health_metrics       1    1   19 KiB       12   56 KiB      0
  61 TiB
.rgw.root                   2   32  2.6 KiB        6  1.1 MiB      0
  61 TiB
default.rgw.log             3   32  168 KiB      210   13 MiB      0
  61 TiB
default.rgw.control         4   32      0 B        8      0 B      0
  61 TiB
default.rgw.meta            5    8  4.8 KiB       11  1.9 MiB      0
  61 TiB
default.rgw.buckets.index   6    8  1.6 GiB      211  4.7 GiB      0
  61 TiB

default.rgw.buckets.data   10  128  501 GiB    5.36M  3.5 TiB   1.90
110 TiB

The *default.rgw.buckets.data* pool is using erasure coding:

root@ceph-01:~# ceph osd erasure-code-profile get EC_RGW_HOST
crush-device-class=hdd
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=6
m=4
plugin=jerasure
technique=reed_sol_van
w=8

If anyone could help explain why it's using up 7 times more space, it would
help a lot. Versioning is disabled. ceph version 15.2.13 (octopus stable).

Sincerely,
Ark.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx