Re: bluestore_min_alloc_size_hdd on Octopus (15.2.10) / XFS formatted RBDs

Igor Fedotov <ifedotov@xxxxxxx> · Fri, 9 Apr 2021 00:41:28 +0300

Hi David,

On 4/7/2021 7:43 PM, David Orman wrote:
Now that the hybrid allocator appears to be enabled by default in
Octopus, is it safe to change bluestore_min_alloc_size_hdd to 4k from
64k on Octopus 15.2.10 clusters, and then redeploy every OSD to switch
to the smaller allocation size, without massive performance impact for
RBD? We're seeing a lot of storage usage amplification on EC 8+3
clusters which are HDD backed that lines up with a lot of the mailing
list posts we've seen here. Upgrading to Pacific before making this
change is also a possibility once a more stable release arrives, if
that's necessary.

I wouldn't recommend switching to 4K min alloc size for pre-Pacific 
cluesters. Additional fixes besides Hybrid Allocator are required to 
avoid performance degradation.

And we decided not to backport that changes to Octopus as they look too 
complicated.

Second part of this question - we are using RBDs currently on the
clusters impacted. These have XFS filesystems on top, which detect the
sector size of the RBD as 512byte, and XFS has a block size of 4k.
With the default of 64k for bluestore_min_alloc_size_hdd, let's say a
1G file is written out to the XFS filesystem backed by the RBD. On the
ceph side, is this being seen as a lot of 4k objects thus a
significant space waste is occurring, or is RBD able to coalesce these
into 64k objects, even though XFS is using a 4k block size?

XFS details below, you can see the allocation groups are quite large:

meta-data=/dev/rbd0              isize=512    agcount=501, agsize=268435440 blks
          =                       sectsz=512   attr=2, projid32bit=1
          =                       crc=1        finobt=1, sparse=1, rmapbt=0
          =                       reflink=1
data     =                       bsize=4096   blocks=134217728000, imaxpct=1
          =                       sunit=16     swidth=16 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=521728, version=2
          =                       sectsz=512   sunit=16 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

I'm curious if people have been tuning XFS on RBD for better
performance, as well.

I presume that actual writing blocks are determined primarily by the 
application - e.g. whether buffered/direct I/O is in use and how often 
flush/sync calls are made.

Speculating rather than know for sure though....

Thank you!
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx