bluestore_min_alloc_size and bluefs_shared_alloc_size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Summary
----------
The relationship of the values configured for bluestore_min_alloc_size and bluefs_shared_alloc_size are reported to impact space amplification, partial overwrites in erasure coded pools, and storage capacity as an osd becomes more fragmented and/or more full.


Previous discussions including this topic
----------------------------------------
comment #7 in bug 63618 in Dec 2023 - https://tracker.ceph.com/issues/63618#note-7

pad writeup related to bug 62282 likely from late 2023 - https://pad.ceph.com/p/RCA_62282

email sent 13 Sept 2023 in mail list discussion of cannot create new osd - https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/5M4QAXJDCNJ74XVIBIFSHHNSETCCKNMC/

comment #9 in bug 58530 likely from early 2023 - https://tracker.ceph.com/issues/58530#note-9

email sent 30 Sept 2021 in mail list discussion of flapping osds - https://www.mail-archive.com/ceph-users@xxxxxxx/msg13072.html

email sent 25 Feb 2020 in mail list discussion of changing allocation size - https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/B3DGKH6THFGHALLX6ATJ4GGD4SVFNEKU/


Current situation
-----------------
We have three Ceph clusters that were originally built via cephadm on octopus and later upgraded to pacific. All osds are HDD (will be moving to wal+db on SSD) and were resharded after the upgrade to enable rocksdb sharding. 

The value for bluefs_shared_alloc_size has remained unchanged at 65535. 

The value for bluestore_min_alloc_size_hdd was 65535 in octopus but is reported as 4096 by ceph daemon osd.<id> config show in pacific. However, the osd label after upgrading to pacific retains the value of 65535 for bfm_bytes_per_block. BitmapFreelistManager.h in Ceph source code (src/os/bluestore/BitmapFreelistManager.h) indicates that bytes_per_block is bdev_block_size.  This indicates that the physical layout of the osd has not changed from 65535 despite the return of the ceph dameon command reporting it as 4096. This interpretation is supported by the Minimum Allocation Size part of the Bluestore configuration reference for quincy (https://docs.ceph.com/en/quincy/rados/configuration/bluestore-config-ref/#minimum-allocation-size)


Questions
----------
What are the pros and cons of the following three cases with two variations per case - when using co-located wal+db on HDD and when using separate wal+db on SSD:
1) bluefs_shared_alloc_size, bluestore_min_alloc_size, and bfm_bytes_per_block all equal
2) bluefs_shared_alloc_size greater than but a multiple of bluestore_min_alloc_size with bfm_bytes_per_block equal to bluestore_min_alloc_size
3) bluefs_shared_alloc_size greater than but a multiple of bluestore_min_alloc_size with bfm_bytes_per_block equal to bluefs_shared_alloc_size
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux