Re: What is client request_load_avg? Troubleshooting MDS issues on Luminous

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/19/22 15:04, Frank Schilder wrote:
Hi Chris,

looks like your e-mail stampede is over :) I will cherry-pick some questions to answer, other things either follow or you will figure it out with the docs and trial-and-error. The cluster set-up is actually not that bad.

1) Set osd_op_queue_cut_off = high on global level. Even though its prefixed with osd_, it seems actually used by more daemons. After setting it just on the OSDs on my mimic-cluster, the MDSes crashed until I set it on global level.

2) I think your PG nums on the pools are fine, you should aim for between 100-200 PGs per OSD. On the meta-data pool you can increase it if you convert your OSDs one-by-one to LVM (mimic and later) and deploy 2 or 4 OSDs per NVMe drive.

If you are not running latest luminous, I would advise to do so if you first want to make improvements / changes before upgrading.

LVM support was gained by ceph-volume in 12.2.2 and support for bluestore added [1]. It was the first release we started with. So if you want, and I guess you are running > 12.2.2, it's possible to change to bluestore with LVM before upgrading to a newer release. I would recommend (certainly with many small files) to set the following property (nowadays the default in Ceph):

# 4096 B instead of 16K (SSD) / 64K (HDD) to avoid large overhead for small (cephFS) files
bluestore_min_alloc_size_ssd = 4096
bluestore_min_alloc_size_hdd = 4096

Otherwise you will end up wasting a lot of space and might even run out of space during upgrade. You cannot change that parameter afterwards (or at least it won't have any affect) ... so make sure you have that set before converting from filestore to bluestore.

# MEMORY ALLOCATOR
bluestore_allocator = bitmap
bluefs_allocator = bitmap

Nowadays hybrid is the default memory allocator, but not available in luminous, and bitmap is better than stupid (default in older luminous releases).

I would postpone changing to bluestore though. You will need to do a conversion from Octopus -> Pacific. And this might take a lot of time for drives with a lot of OMAP. You won't need to do any of that when changing filestore -> bluestore when on Pacific and benefit from the sharding in rocksdb and some other improvements with regard to OMAP related functionality (that also CephFS will benefit from). Then you also don't need to manually perform all the sharding for RockSDB. We have followed L->M->N->O path. But in hindsight it would have been better to have skipped O and directly moved to P. Not that O is bad, not at all actually, it's just there is a lot of extra work involved and Pacific is mature enough by now. We are waiting for 16.2.10 to make the move to P. All the proper settings are there now by default.

My 2 cents.

Gr. Stefan

[1]. https://docs.ceph.com/en/latest/releases/luminous/#id44
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux