I started my experimental 1-host/8-HDDs setup in 2018 with Luminous,
and I read https://ceph.io/community/new-luminous-erasure-coding-rbd-cephfs/ ,
which had interested me in using Bluestore and rewriteable EC pools for RBD data.
I have about 22 TiB or raw storage, and ceph df shows this :
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 22 TiB 2.7 TiB 19 TiB 19 TiB 87.78
TOTAL 22 TiB 2.7 TiB 19 TiB 19 TiB 87.78
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
jerasure21 1 256 9.0 TiB 2.32M 13 TiB 97.06 276 GiB
libvirt 2 128 1.5 TiB 413.60k 4.5 TiB 91.77 140 GiB
rbd 3 32 798 KiB 5 2.7 MiB 0 138 GiB
iso 4 32 2.3 MiB 10 8.0 MiB 0 138 GiB
device_health_metrics 5 1 31 MiB 9 94 MiB 0.02 138 GiB
If I add USED for libvirt and jerasure21 , I get 17.5 TiB, and 2.7 TiB is shown at RAW STORAGE/AVAIL
Sum of POOLS/MAX AVAIL is about 840 GiB, where are my other 2.7-0.840 =~ 1.86 TiB ???
Or in different words, where are my (RAW STORAGE/RAW USED)-(SUM(POOLS/USED)) = 19-17.5 = 1.5 TiB ?
As it does not seem I would get any more hosts for this setup,
I am seriously thinking of bringing down this Ceph
and setting up instead a Btrfs storing qcow2 images served over iSCSI
which looks simpler to me for single-host situation.
Josh Baergen wrote:
Hey Wladimir,
I actually don't know where this is referenced in the docs, if anywhere. Googling around shows many people discovering this overhead the hard way on ceph-users.
I also don't know the rbd journaling mechanism in enough depth to comment on whether it could be causing this issue for you. Are you seeing a high
allocated:stored ratio on your cluster?
Josh
On Sun, Jul 4, 2021 at 6:52 AM Wladimir Mutel <mwg@xxxxxxxxx <mailto:mwg@xxxxxxxxx>> wrote:
Dear Mr Baergen,
thanks a lot for your very concise explanation,
however I would like to learn more why default Bluestore alloc.size causes such a big storage overhead,
and where in the Ceph docs it is explained how and what to watch for to avoid hitting this phenomenon again and again.
I have a feeling this is what I get on my experimental Ceph setup with simplest JErasure 2+1 data pool.
Could it be caused by journaled RBD writes to EC data-pool ?
Josh Baergen wrote:
> Hey Arkadiy,
>
> If the OSDs are on HDDs and were created with the default
> bluestore_min_alloc_size_hdd, which is still 64KiB in Octopus, then in
> effect data will be allocated from the pool in 640KiB chunks (64KiB *
> (k+m)). 5.36M objects taking up 501GiB is an average object size of 98KiB
> which results in a ratio of 6.53:1 allocated:stored, which is pretty close
> to the 7:1 observed.
>
> If my assumption about your configuration is correct, then the only way to
> fix this is to adjust bluestore_min_alloc_size_hdd and recreate all your
> OSDs, which will take a while...
>
> Josh
>
> On Tue, Jun 29, 2021 at 3:07 PM Arkadiy Kulev <eth@xxxxxxxxxxxx <mailto:eth@xxxxxxxxxxxx>> wrote:
>
>> The pool *default.rgw.buckets.data* has *501 GiB* stored, but USED shows
>> *3.5
>> TiB *(7 times higher!)*:*
>>
>> root@ceph-01:~# ceph df
>> --- RAW STORAGE ---
>> CLASS SIZE AVAIL USED RAW USED %RAW USED
>> hdd 196 TiB 193 TiB 3.5 TiB 3.6 TiB 1.85
>> TOTAL 196 TiB 193 TiB 3.5 TiB 3.6 TiB 1.85
>>
>> --- POOLS ---
>> POOL ID PGS STORED OBJECTS USED %USED MAX
>> AVAIL
>> device_health_metrics 1 1 19 KiB 12 56 KiB 0
>> 61 TiB
>> .rgw.root 2 32 2.6 KiB 6 1.1 MiB 0
>> 61 TiB
>> default.rgw.log 3 32 168 KiB 210 13 MiB 0
>> 61 TiB
>> default.rgw.control 4 32 0 B 8 0 B 0
>> 61 TiB
>> default.rgw.meta 5 8 4.8 KiB 11 1.9 MiB 0
>> 61 TiB
>> default.rgw.buckets.index 6 8 1.6 GiB 211 4.7 GiB 0
>> 61 TiB
>>
>> default.rgw.buckets.data 10 128 501 GiB 5.36M 3.5 TiB 1.90
>> 110 TiB
>>
>> The *default.rgw.buckets.data* pool is using erasure coding:
>>
>> root@ceph-01:~# ceph osd erasure-code-profile get EC_RGW_HOST
>> crush-device-class=hdd
>> crush-failure-domain=host
>> crush-root=default
>> jerasure-per-chunk-alignment=false
>> k=6
>> m=4
>> plugin=jerasure
>> technique=reed_sol_van
>> w=8
>>
>> If anyone could help explain why it's using up 7 times more space, it would
>> help a lot. Versioning is disabled. ceph version 15.2.13 (octopus stable).
>>
>> Sincerely,
>> Ark.
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>
>>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
> To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx