Re: ceph df (octopus) shows USED is 7 times higher than STORED in erasure coded pool

Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx> · Mon, 5 Jul 2021 09:28:54 -0600

Hey Wladimir,

I actually don't know where this is referenced in the docs, if anywhere.
Googling around shows many people discovering this overhead the hard way on
ceph-users.

I also don't know the rbd journaling mechanism in enough depth to comment
on whether it could be causing this issue for you. Are you seeing a high
allocated:stored ratio on your cluster?

Josh

On Sun, Jul 4, 2021 at 6:52 AM Wladimir Mutel <mwg@xxxxxxxxx> wrote:

> Dear Mr Baergen,
>
> thanks a lot for your very concise explanation,
> however I would like to learn more why default Bluestore alloc.size causes
> such a big storage overhead,
> and where in the Ceph docs it is explained how and what to watch for to
> avoid hitting this phenomenon again and again.
> I have a feeling this is what I get on my experimental Ceph setup with
> simplest JErasure 2+1 data pool.
> Could it be caused by journaled RBD writes to EC data-pool ?
>
> Josh Baergen wrote:
> > Hey Arkadiy,
> >
> > If the OSDs are on HDDs and were created with the default
> > bluestore_min_alloc_size_hdd, which is still 64KiB in Octopus, then in
> > effect data will be allocated from the pool in 640KiB chunks (64KiB *
> > (k+m)). 5.36M objects taking up 501GiB is an average object size of 98KiB
> > which results in a ratio of 6.53:1 allocated:stored, which is pretty
> close
> > to the 7:1 observed.
> >
> > If my assumption about your configuration is correct, then the only way
> to
> > fix this is to adjust bluestore_min_alloc_size_hdd and recreate all your
> > OSDs, which will take a while...
> >
> > Josh
> >
> > On Tue, Jun 29, 2021 at 3:07 PM Arkadiy Kulev <eth@xxxxxxxxxxxx> wrote:
> >
> >> The pool *default.rgw.buckets.data* has *501 GiB* stored, but USED shows
> >> *3.5
> >> TiB *(7 times higher!)*:*
> >>
> >> root@ceph-01:~# ceph df
> >> --- RAW STORAGE ---
> >> CLASS  SIZE     AVAIL    USED     RAW USED  %RAW USED
> >> hdd    196 TiB  193 TiB  3.5 TiB   3.6 TiB       1.85
> >> TOTAL  196 TiB  193 TiB  3.5 TiB   3.6 TiB       1.85
> >>
> >> --- POOLS ---
> >> POOL                       ID  PGS  STORED   OBJECTS  USED     %USED
> MAX
> >> AVAIL
> >> device_health_metrics       1    1   19 KiB       12   56 KiB      0
> >>   61 TiB
> >> .rgw.root                   2   32  2.6 KiB        6  1.1 MiB      0
> >>   61 TiB
> >> default.rgw.log             3   32  168 KiB      210   13 MiB      0
> >>   61 TiB
> >> default.rgw.control         4   32      0 B        8      0 B      0
> >>   61 TiB
> >> default.rgw.meta            5    8  4.8 KiB       11  1.9 MiB      0
> >>   61 TiB
> >> default.rgw.buckets.index   6    8  1.6 GiB      211  4.7 GiB      0
> >>   61 TiB
> >>
> >> default.rgw.buckets.data   10  128  501 GiB    5.36M  3.5 TiB   1.90
> >> 110 TiB
> >>
> >> The *default.rgw.buckets.data* pool is using erasure coding:
> >>
> >> root@ceph-01:~# ceph osd erasure-code-profile get EC_RGW_HOST
> >> crush-device-class=hdd
> >> crush-failure-domain=host
> >> crush-root=default
> >> jerasure-per-chunk-alignment=false
> >> k=6
> >> m=4
> >> plugin=jerasure
> >> technique=reed_sol_van
> >> w=8
> >>
> >> If anyone could help explain why it's using up 7 times more space, it
> would
> >> help a lot. Versioning is disabled. ceph version 15.2.13 (octopus
> stable).
> >>
> >> Sincerely,
> >> Ark.
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx