On Fri, 2022-08-19 at 15:48 +0200, Stefan Kooman wrote: > On 8/19/22 15:04, Frank Schilder wrote: > > Hi Chris, > > > > looks like your e-mail stampede is over :) I will cherry-pick some > > questions to answer, other things either follow or you will figure > > it out with the docs and trial-and-error. The cluster set-up is > > actually not that bad. > > > > 1) Set osd_op_queue_cut_off = high on global level. Even though its > > prefixed with osd_, it seems actually used by more daemons. After > > setting it just on the OSDs on my mimic-cluster, the MDSes crashed > > until I set it on global level. > > > > 2) I think your PG nums on the pools are fine, you should aim for > > between 100-200 PGs per OSD. On the meta-data pool you can increase > > it if you convert your OSDs one-by-one to LVM (mimic and later) and > > deploy 2 or 4 OSDs per NVMe drive. > > If you are not running latest luminous, I would advise to do so if > you > first want to make improvements / changes before upgrading. > Hi Stefan, Thanks for the reply. I don't think we're on the latest, so that makes sense, thanks. > LVM support was gained by ceph-volume in 12.2.2 and support for > bluestore added [1]. It was the first release we started with. So if > you > want, and I guess you are running > 12.2.2, it's possible to change > to > bluestore with LVM before upgrading to a newer release. I would > recommend (certainly with many small files) to set the following > property (nowadays the default in Ceph): > > # 4096 B instead of 16K (SSD) / 64K (HDD) to avoid large overhead for > small (cephFS) files > bluestore_min_alloc_size_ssd = 4096 > bluestore_min_alloc_size_hdd = 4096 > > Otherwise you will end up wasting a lot of space and might even run > out > of space during upgrade. You cannot change that parameter afterwards > (or > at least it won't have any affect) ... so make sure you have that set > before converting from filestore to bluestore. > Oh, great info, thanks very much! We do have lots of small files. All the OSDs are on filestore at the moment, so I was definitely planning on moving over to bluestore as a part of trying to address these issues (and expand the cluster, etc). > # MEMORY ALLOCATOR > bluestore_allocator = bitmap > bluefs_allocator = bitmap > > Nowadays hybrid is the default memory allocator, but not available in > luminous, and bitmap is better than stupid (default in older luminous > releases). > > I would postpone changing to bluestore though. You will need to do a > conversion from Octopus -> Pacific. And this might take a lot of time > for drives with a lot of OMAP. You won't need to do any of that when > changing filestore -> bluestore when on Pacific and benefit from the > sharding in rocksdb and some other improvements with regard to OMAP > related functionality (that also CephFS will benefit from). Then you > also don't need to manually perform all the sharding for RockSDB. We > have followed L->M->N->O path. But in hindsight it would have been > better to have skipped O and directly moved to P. Not that O is bad, > not > at all actually, it's just there is a lot of extra work involved and > Pacific is mature enough by now. We are waiting for 16.2.10 to make > the > move to P. All the proper settings are there now by default. > OK, so basically sounds like I should stick with filestore, ugprade the cluster to Pacific to inherit the newer settings, then do the conversion to bluestore which will avoid manual sharding etc. Did I understand correctly? I expect I'll probably send a few more emails to the list before I undertake the upgrades, to try and undertsand as many optimisations and avoid as many pitfalls as possible. Thanks very much! -c > My 2 cents. > > Gr. Stefan > > [1]. https://docs.ceph.com/en/latest/releases/luminous/#id44 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx