getting rid of filestore solves most latency spike issues during recovery because they are often caused by random XFS hangs (splitting dirs or just xfs having a bad day) Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Tue, Oct 22, 2019 at 6:02 AM Mark Kirkwood <mark.kirkwood@xxxxxxxxxxxxxxx> wrote: > > We recently needed to reweight a couple of OSDs on one of our clusters > (luminous on Ubuntu, 8 hosts, 8 OSD/host). I (think) we reweighted by > approx 0.2. This was perhaps too much, as IO latency on RBD drives > spiked to several seconds at times. > > We'd like to lessen this effect as much as we can. So we are looking at > priority and queue parameters (OSDs are Filestore based with S3700 SSD > or similar NVME journals): > > # priorities > osd_client_op_priority > osd_recovery_op_priority > osd_recovery_priority > osd_scrub_priority > osd_snap_trim_priority > > # queue tuning > filestore_queue_max_ops > filestore_queue_low_threshhold > filestore_queue_high_threshhold > filestore_expected_throughput_ops > filestore_queue_high_delay_multiple > filestore_queue_max_delay_multiple > > My first question is this - do these parameters require the CFQ > scheduler (like osd_disk_thread_ioprio_priority does)? We are currently > using deadline (we have not tweaked queue/iosched/write_expire down from > 5000 to 1500 which might be good to do). > > My 2nd question is - should we consider increasing > osd_disk_thread_ioprio_priority (and hence changing to CFQ scheduler)? I > usually see this parameter discussed WRT scrubbing, and we are not > having issues with that. > > regards > > Mark > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com