On 8/19/22 15:04, Frank Schilder wrote:
Hi Chris,
looks like your e-mail stampede is over :) I will cherry-pick some questions to answer, other things either follow or you will figure it out with the docs and trial-and-error. The cluster set-up is actually not that bad.
1) Set osd_op_queue_cut_off = high on global level. Even though its prefixed with osd_, it seems actually used by more daemons. After setting it just on the OSDs on my mimic-cluster, the MDSes crashed until I set it on global level.
2) I think your PG nums on the pools are fine, you should aim for between 100-200 PGs per OSD. On the meta-data pool you can increase it if you convert your OSDs one-by-one to LVM (mimic and later) and deploy 2 or 4 OSDs per NVMe drive.
If you are not running latest luminous, I would advise to do so if you
first want to make improvements / changes before upgrading.
LVM support was gained by ceph-volume in 12.2.2 and support for
bluestore added [1]. It was the first release we started with. So if you
want, and I guess you are running > 12.2.2, it's possible to change to
bluestore with LVM before upgrading to a newer release. I would
recommend (certainly with many small files) to set the following
property (nowadays the default in Ceph):
# 4096 B instead of 16K (SSD) / 64K (HDD) to avoid large overhead for
small (cephFS) files
bluestore_min_alloc_size_ssd = 4096
bluestore_min_alloc_size_hdd = 4096
Otherwise you will end up wasting a lot of space and might even run out
of space during upgrade. You cannot change that parameter afterwards (or
at least it won't have any affect) ... so make sure you have that set
before converting from filestore to bluestore.
# MEMORY ALLOCATOR
bluestore_allocator = bitmap
bluefs_allocator = bitmap
Nowadays hybrid is the default memory allocator, but not available in
luminous, and bitmap is better than stupid (default in older luminous
releases).
I would postpone changing to bluestore though. You will need to do a
conversion from Octopus -> Pacific. And this might take a lot of time
for drives with a lot of OMAP. You won't need to do any of that when
changing filestore -> bluestore when on Pacific and benefit from the
sharding in rocksdb and some other improvements with regard to OMAP
related functionality (that also CephFS will benefit from). Then you
also don't need to manually perform all the sharding for RockSDB. We
have followed L->M->N->O path. But in hindsight it would have been
better to have skipped O and directly moved to P. Not that O is bad, not
at all actually, it's just there is a lot of extra work involved and
Pacific is mature enough by now. We are waiting for 16.2.10 to make the
move to P. All the proper settings are there now by default.
My 2 cents.
Gr. Stefan
[1]. https://docs.ceph.com/en/latest/releases/luminous/#id44
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx