Re: ceph df (octopus) shows USED is 7 times higher than STORED in erasure coded pool

Christian Wuerdig <christian.wuerdig@xxxxxxxxx> · Wed, 7 Jul 2021 10:52:28 +1200

Ceph on a single host makes little to no sense. You're better of running
something like ZFS

On Tue, 6 Jul 2021 at 23:52, Wladimir Mutel <mwg@xxxxxxxxx> wrote:

>         I started my experimental 1-host/8-HDDs setup in 2018 with
> Luminous,
>         and I read
> https://ceph.io/community/new-luminous-erasure-coding-rbd-cephfs/ ,
>         which had interested me in using Bluestore and rewriteable EC
> pools for RBD data.
>         I have about 22 TiB or raw storage, and ceph df shows this :
>
> --- RAW STORAGE ---
> CLASS    SIZE    AVAIL    USED  RAW USED  %RAW USED
> hdd    22 TiB  2.7 TiB  19 TiB    19 TiB      87.78
> TOTAL  22 TiB  2.7 TiB  19 TiB    19 TiB      87.78
>
> --- POOLS ---
> POOL                   ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
> jerasure21              1  256  9.0 TiB    2.32M   13 TiB  97.06    276 GiB
> libvirt                 2  128  1.5 TiB  413.60k  4.5 TiB  91.77    140 GiB
> rbd                     3   32  798 KiB        5  2.7 MiB      0    138 GiB
> iso                     4   32  2.3 MiB       10  8.0 MiB      0    138 GiB
> device_health_metrics   5    1   31 MiB        9   94 MiB   0.02    138 GiB
>
>         If I add USED for libvirt and jerasure21 , I get 17.5 TiB, and 2.7
> TiB is shown at RAW STORAGE/AVAIL
>         Sum of POOLS/MAX AVAIL is about 840 GiB, where are my other
> 2.7-0.840 =~ 1.86 TiB ???
>         Or in different words, where are my (RAW STORAGE/RAW
> USED)-(SUM(POOLS/USED)) = 19-17.5 = 1.5 TiB ?
>
>         As it does not seem I would get any more hosts for this setup,
>         I am seriously thinking of bringing down this Ceph
>         and setting up instead a Btrfs storing qcow2 images served over
> iSCSI
>         which looks simpler to me for single-host situation.
>
> Josh Baergen wrote:
> > Hey Wladimir,
> >
> > I actually don't know where this is referenced in the docs, if anywhere.
> Googling around shows many people discovering this overhead the hard way on
> ceph-users.
> >
> > I also don't know the rbd journaling mechanism in enough depth to
> comment on whether it could be causing this issue for you. Are you seeing a
> high
> > allocated:stored ratio on your cluster?
> >
> > Josh
> >
> > On Sun, Jul 4, 2021 at 6:52 AM Wladimir Mutel <mwg@xxxxxxxxx <mailto:
> mwg@xxxxxxxxx>> wrote:
> >
> >     Dear Mr Baergen,
> >
> >     thanks a lot for your very concise explanation,
> >     however I would like to learn more why default Bluestore alloc.size
> causes such a big storage overhead,
> >     and where in the Ceph docs it is explained how and what to watch for
> to avoid hitting this phenomenon again and again.
> >     I have a feeling this is what I get on my experimental Ceph setup
> with simplest JErasure 2+1 data pool.
> >     Could it be caused by journaled RBD writes to EC data-pool ?
> >
> >     Josh Baergen wrote:
> >      > Hey Arkadiy,
> >      >
> >      > If the OSDs are on HDDs and were created with the default
> >      > bluestore_min_alloc_size_hdd, which is still 64KiB in Octopus,
> then in
> >      > effect data will be allocated from the pool in 640KiB chunks
> (64KiB *
> >      > (k+m)). 5.36M objects taking up 501GiB is an average object size
> of 98KiB
> >      > which results in a ratio of 6.53:1 allocated:stored, which is
> pretty close
> >      > to the 7:1 observed.
> >      >
> >      > If my assumption about your configuration is correct, then the
> only way to
> >      > fix this is to adjust bluestore_min_alloc_size_hdd and recreate
> all your
> >      > OSDs, which will take a while...
> >      >
> >      > Josh
> >      >
> >      > On Tue, Jun 29, 2021 at 3:07 PM Arkadiy Kulev <eth@xxxxxxxxxxxx
> <mailto:eth@xxxxxxxxxxxx>> wrote:
> >      >
> >      >> The pool *default.rgw.buckets.data* has *501 GiB* stored, but
> USED shows
> >      >> *3.5
> >      >> TiB *(7 times higher!)*:*
> >      >>
> >      >> root@ceph-01:~# ceph df
> >      >> --- RAW STORAGE ---
> >      >> CLASS  SIZE     AVAIL    USED     RAW USED  %RAW USED
> >      >> hdd    196 TiB  193 TiB  3.5 TiB   3.6 TiB       1.85
> >      >> TOTAL  196 TiB  193 TiB  3.5 TiB   3.6 TiB       1.85
> >      >>
> >      >> --- POOLS ---
> >      >> POOL                       ID  PGS  STORED   OBJECTS  USED
>  %USED  MAX
> >      >> AVAIL
> >      >> device_health_metrics       1    1   19 KiB       12   56 KiB
>   0
> >      >>   61 TiB
> >      >> .rgw.root                   2   32  2.6 KiB        6  1.1 MiB
>   0
> >      >>   61 TiB
> >      >> default.rgw.log             3   32  168 KiB      210   13 MiB
>   0
> >      >>   61 TiB
> >      >> default.rgw.control         4   32      0 B        8      0 B
>   0
> >      >>   61 TiB
> >      >> default.rgw.meta            5    8  4.8 KiB       11  1.9 MiB
>   0
> >      >>   61 TiB
> >      >> default.rgw.buckets.index   6    8  1.6 GiB      211  4.7 GiB
>   0
> >      >>   61 TiB
> >      >>
> >      >> default.rgw.buckets.data   10  128  501 GiB    5.36M  3.5 TiB
>  1.90
> >      >> 110 TiB
> >      >>
> >      >> The *default.rgw.buckets.data* pool is using erasure coding:
> >      >>
> >      >> root@ceph-01:~# ceph osd erasure-code-profile get EC_RGW_HOST
> >      >> crush-device-class=hdd
> >      >> crush-failure-domain=host
> >      >> crush-root=default
> >      >> jerasure-per-chunk-alignment=false
> >      >> k=6
> >      >> m=4
> >      >> plugin=jerasure
> >      >> technique=reed_sol_van
> >      >> w=8
> >      >>
> >      >> If anyone could help explain why it's using up 7 times more
> space, it would
> >      >> help a lot. Versioning is disabled. ceph version 15.2.13
> (octopus stable).
> >      >>
> >      >> Sincerely,
> >      >> Ark.
> >      >> _______________________________________________
> >      >> ceph-users mailing list -- ceph-users@xxxxxxx <mailto:
> ceph-users@xxxxxxx>
> >      >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> <mailto:ceph-users-leave@xxxxxxx>
> >      >>
> >      > _______________________________________________
> >      > ceph-users mailing list -- ceph-users@xxxxxxx <mailto:
> ceph-users@xxxxxxx>
> >      > To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:
> ceph-users-leave@xxxxxxx>
> >      >
> >
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx