Re: Bluestore compression - Which algo to choose? Zstd really still that bad?

Zach Underwood <zunder1990@xxxxxxxxx> · Tue, 27 Jun 2023 08:25:54 -0400

I dont have the most experience with ceph as my use case is homelab and
only a few months in. I enabled compression on my vm(proxmox hosts) disk
rdb pool using mode = aggressive, algorithm = lz4, all other compression
settings default. After copying all of the vm disks to another storage and
then back to ceph pool I saw a near %50 reduction in space needed for vm
disks. I have not had a chance to benchmark the vm disks with compression
yet as I am waiting for the cluster to calm down from some other disks
moves.

On Tue, Jun 27, 2023 at 7:01 AM Christian Rohmann <
christian.rohmann@xxxxxxxxx> wrote:

> Hey Igor,
>
> On 27/06/2023 12:06, Igor Fedotov wrote:
> > I can't say anything about your primary question on zstd
> > benefits/drawbacks but I'd like to emphasize that compression ratio at
> > BlueStore is (to a major degree) determined by the input data flow
> > characteristics (primarily write block size), object store allocation
> > unit size (bluestore_min_alloc_size) and some parameters (e.g. maximum
> > blob size) that determine how input data chunks are logically split
> > when landing on disk.
> > E.g. if one has min_alloc_size set to 4K and write block size is in
> > (4K-8K] then resulting compressed block would never be less than 4K.
> > Hence compression ratio is never more than 2.
> > Similarly if min_alloc_size is 64K there would be no benefit in
> > compression at all for the above input since target allocation units
> > are always larger than input blocks.
> > The rationale of the above behavior is that compression is applied
> > exclusively on input blocks - there is no additional processing to
> > merge input and existing data and compress them all together.
>
>
> Thanks for the emphasis on input data and its block-size. Yes, that is
> certainly the most important factor for the compression efficiency and
> choice of an suitable algorithm for a certain use-case.
> In my case the pool is RBD only, so (by default) the blocks are 4M if I
> am not mistaken. I also understand that even though larger blocks
> generally compress better, I know there is no relation between
> different blocks in regard to compression dictionaries (going along the
> lines of de-duplication). In the end in my use-case it boils down to the
> type of data stored on the RBD images and how compressible that might be.
> But since those blocks are only written once, and I am ready to invest
> more CPU cycles to reduce the size on disk.
>
> I am simply looking for data other might have collected on their similar
> use-cases.
> Also I am still wondering if there really is nobody that worked/played
> more with zstd since that has become so popular in recent months...
>
>
> Regards
>
>
> Christian
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

-- 
Zach Underwood (RHCE,RHCSA,RHCT,UACA)
My website <http://zachunderwood.me>
advance-networking.com
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx