Oh, I just read your message again, and I see that I didn't answer your question. :D I admit I don't know how MAX AVAIL is calculated, and whether it takes things like imbalance into account (it might). Josh On Tue, Jul 6, 2021 at 7:41 AM Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx> wrote: > Hey Wladimir, > > That output looks like it's from Nautilus or later. My understanding is > that the USED column is in raw bytes, whereas STORED is "user" bytes. If > you're using EC 2:1 for all of those pools, I would expect USED to be at > least 1.5x STORED, which looks to be the case for jerasure21. Perhaps your > libvirt pool is 3x replicated, in which case the numbers add up as well. > > Josh > > On Tue, Jul 6, 2021 at 5:51 AM Wladimir Mutel <mwg@xxxxxxxxx> wrote: > >> I started my experimental 1-host/8-HDDs setup in 2018 with >> Luminous, >> and I read >> https://ceph.io/community/new-luminous-erasure-coding-rbd-cephfs/ , >> which had interested me in using Bluestore and rewriteable EC >> pools for RBD data. >> I have about 22 TiB or raw storage, and ceph df shows this : >> >> --- RAW STORAGE --- >> CLASS SIZE AVAIL USED RAW USED %RAW USED >> hdd 22 TiB 2.7 TiB 19 TiB 19 TiB 87.78 >> TOTAL 22 TiB 2.7 TiB 19 TiB 19 TiB 87.78 >> >> --- POOLS --- >> POOL ID PGS STORED OBJECTS USED %USED MAX >> AVAIL >> jerasure21 1 256 9.0 TiB 2.32M 13 TiB 97.06 276 >> GiB >> libvirt 2 128 1.5 TiB 413.60k 4.5 TiB 91.77 140 >> GiB >> rbd 3 32 798 KiB 5 2.7 MiB 0 138 >> GiB >> iso 4 32 2.3 MiB 10 8.0 MiB 0 138 >> GiB >> device_health_metrics 5 1 31 MiB 9 94 MiB 0.02 138 >> GiB >> >> If I add USED for libvirt and jerasure21 , I get 17.5 TiB, and >> 2.7 TiB is shown at RAW STORAGE/AVAIL >> Sum of POOLS/MAX AVAIL is about 840 GiB, where are my other >> 2.7-0.840 =~ 1.86 TiB ??? >> Or in different words, where are my (RAW STORAGE/RAW >> USED)-(SUM(POOLS/USED)) = 19-17.5 = 1.5 TiB ? >> >> As it does not seem I would get any more hosts for this setup, >> I am seriously thinking of bringing down this Ceph >> and setting up instead a Btrfs storing qcow2 images served over >> iSCSI >> which looks simpler to me for single-host situation. >> >> Josh Baergen wrote: >> > Hey Wladimir, >> > >> > I actually don't know where this is referenced in the docs, if >> anywhere. Googling around shows many people discovering this overhead the >> hard way on ceph-users. >> > >> > I also don't know the rbd journaling mechanism in enough depth to >> comment on whether it could be causing this issue for you. Are you seeing a >> high >> > allocated:stored ratio on your cluster? >> > >> > Josh >> > >> > On Sun, Jul 4, 2021 at 6:52 AM Wladimir Mutel <mwg@xxxxxxxxx <mailto: >> mwg@xxxxxxxxx>> wrote: >> > >> > Dear Mr Baergen, >> > >> > thanks a lot for your very concise explanation, >> > however I would like to learn more why default Bluestore alloc.size >> causes such a big storage overhead, >> > and where in the Ceph docs it is explained how and what to watch >> for to avoid hitting this phenomenon again and again. >> > I have a feeling this is what I get on my experimental Ceph setup >> with simplest JErasure 2+1 data pool. >> > Could it be caused by journaled RBD writes to EC data-pool ? >> > >> > Josh Baergen wrote: >> > > Hey Arkadiy, >> > > >> > > If the OSDs are on HDDs and were created with the default >> > > bluestore_min_alloc_size_hdd, which is still 64KiB in Octopus, >> then in >> > > effect data will be allocated from the pool in 640KiB chunks >> (64KiB * >> > > (k+m)). 5.36M objects taking up 501GiB is an average object size >> of 98KiB >> > > which results in a ratio of 6.53:1 allocated:stored, which is >> pretty close >> > > to the 7:1 observed. >> > > >> > > If my assumption about your configuration is correct, then the >> only way to >> > > fix this is to adjust bluestore_min_alloc_size_hdd and recreate >> all your >> > > OSDs, which will take a while... >> > > >> > > Josh >> > > >> > > On Tue, Jun 29, 2021 at 3:07 PM Arkadiy Kulev <eth@xxxxxxxxxxxx >> <mailto:eth@xxxxxxxxxxxx>> wrote: >> > > >> > >> The pool *default.rgw.buckets.data* has *501 GiB* stored, but >> USED shows >> > >> *3.5 >> > >> TiB *(7 times higher!)*:* >> > >> >> > >> root@ceph-01:~# ceph df >> > >> --- RAW STORAGE --- >> > >> CLASS SIZE AVAIL USED RAW USED %RAW USED >> > >> hdd 196 TiB 193 TiB 3.5 TiB 3.6 TiB 1.85 >> > >> TOTAL 196 TiB 193 TiB 3.5 TiB 3.6 TiB 1.85 >> > >> >> > >> --- POOLS --- >> > >> POOL ID PGS STORED OBJECTS USED >> %USED MAX >> > >> AVAIL >> > >> device_health_metrics 1 1 19 KiB 12 56 KiB >> 0 >> > >> 61 TiB >> > >> .rgw.root 2 32 2.6 KiB 6 1.1 MiB >> 0 >> > >> 61 TiB >> > >> default.rgw.log 3 32 168 KiB 210 13 MiB >> 0 >> > >> 61 TiB >> > >> default.rgw.control 4 32 0 B 8 0 B >> 0 >> > >> 61 TiB >> > >> default.rgw.meta 5 8 4.8 KiB 11 1.9 MiB >> 0 >> > >> 61 TiB >> > >> default.rgw.buckets.index 6 8 1.6 GiB 211 4.7 GiB >> 0 >> > >> 61 TiB >> > >> >> > >> default.rgw.buckets.data 10 128 501 GiB 5.36M 3.5 TiB >> 1.90 >> > >> 110 TiB >> > >> >> > >> The *default.rgw.buckets.data* pool is using erasure coding: >> > >> >> > >> root@ceph-01:~# ceph osd erasure-code-profile get EC_RGW_HOST >> > >> crush-device-class=hdd >> > >> crush-failure-domain=host >> > >> crush-root=default >> > >> jerasure-per-chunk-alignment=false >> > >> k=6 >> > >> m=4 >> > >> plugin=jerasure >> > >> technique=reed_sol_van >> > >> w=8 >> > >> >> > >> If anyone could help explain why it's using up 7 times more >> space, it would >> > >> help a lot. Versioning is disabled. ceph version 15.2.13 >> (octopus stable). >> > >> >> > >> Sincerely, >> > >> Ark. >> > >> _______________________________________________ >> > >> ceph-users mailing list -- ceph-users@xxxxxxx <mailto: >> ceph-users@xxxxxxx> >> > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> <mailto:ceph-users-leave@xxxxxxx> >> > >> >> > > _______________________________________________ >> > > ceph-users mailing list -- ceph-users@xxxxxxx <mailto: >> ceph-users@xxxxxxx> >> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx >> <mailto:ceph-users-leave@xxxxxxx> >> > > >> > >> >> >> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx