Re: Is there a way to throttle faster osds due to slow ops?

Sridhar Seshasayee <sseshasa@xxxxxxxxxx> · Tue, 1 Oct 2024 15:52:47 +0530

Yes, you can override the capacity using "config set
osd.N osd_mclock_max_capacity_iops_ssd <new_value>".

On Tue, Oct 1, 2024 at 3:45 PM Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>
wrote:

> Digged a bit further, seems like the osd_mclock_max_capacity_iops_ssd in
> config db which comes from ceph bench determined by 1 osd, however if I
> have 4osd on my 15TB nvme and I run the bench in parallel on 1 nvme drive 4
> osds, the result is /4.
>
> Is it safe to divide this value by 4 in the config db?
>
> ________________________________
> From: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>
> Sent: Tuesday, October 1, 2024 1:47 PM
> To: Ceph Users <ceph-users@xxxxxxx>
> Subject:  Is there a way to throttle faster osds due to slow
> ops?
>
> Hi,
>
> We have extended our clusters with some new nodes and currently it is
> impossible to remove from any old node the nvme drive which holding the
> index pool in the cluster without generating slow ops and cluster
> performance degradation.
>
> Currently how I want to remove is in quincy non cephadm cluster is to
> crush reweight to 0 and remove. This data movement makes slow ops all the
> way during the nvme osd out.
>
> In my opinion it might be generated from the faster drives harder push on
> the old servers nvme which makes high iowait on the old nvmes so I want to
> somehow throttle the new nvmes. Not sure with mclock or with any wait is it
> possible? (max backfill, osd recover ops and recovery ops priotity is
> already 1 and balancer max misplaced ratio 0.01).
>
> This some of the slow osd says during remove:
> https://gist.github.com/Badb0yBadb0y/15b51e524a47dfbd2728bbabc18238fc#file-gistfile1-txt
>
> 2024-10-01T11:46:29.601+0700 7f29bf4f8640  0
> bluestore(/var/lib/ceph/osd/ceph-91) log_latency_fn slow operation observed
> for _txc_committed_kv, latency = 5.583707809s, txc = 0x55af2bd2e300
> 2024-10-01T11:46:29.601+0700 7f29bf4f8640  0
> bluestore(/var/lib/ceph/osd/ceph-91) log_latency_fn slow operation observed
> for _txc_committed_kv, latency = 5.541916847s, txc = 0x55af1a035b00
> 2024-10-01T11:46:29.601+0700 7f29bf4f8640  0
> bluestore(/var/lib/ceph/osd/ceph-91) log_latency_fn slow operation observed
> for _txc_committed_kv, latency = 5.533919334s, txc = 0x55af19fafb00
> 2024-10-01T11:46:29.601+0700 7f29bf4f8640  0
> bluestore(/var/lib/ceph/osd/ceph-91) log_latency_fn slow operation observed
> for _txc_committed_kv, latency = 6.904534340s, txc = 0x55af49814c00
> 2024-10-01T11:46:29.601+0700 7f29bf4f8640  0
> bluestore(/var/lib/ceph/osd/ceph-91) log_latency_fn slow operation observed
> for _txc_committed_kv, latency = 6.911001205s, txc = 0x55af24b19800
> 2024-10-01T11:46:29.601+0700 7f29bf4f8640  0
> bluestore(/var/lib/ceph/osd/ceph-91) log_latency_fn slow operation observed
> for _txc_committed_kv, latency = 5.597061634s, txc = 0x55af4fe0fb00
> 2024-10-01T11:46:30.889+0700 7f29becf7640  4 rocksdb:
> [db/db_impl/db_impl_write.cc:1736] [default] New memtable created with log
> file: #280327. Immutable memtables: 0.
> 2024-10-01T11:46:30.889+0700 7f29becf7640  4 rocksdb:
> [db/column_family.cc:983] [default] Increasing compaction threads because
> we have 18 level-0 files
> 2024-10-01T11:46:30.889+0700 7f29c4512640  4 rocksdb: (Original Log Time
> 2024/10/01-11:46:30.893378) [db/db_impl/db_impl_compaction_flush.cc:2394]
> Calling FlushMemTableToOutputFile with column family [default], flush slots
> available 1, compaction slots available 2, flush slots scheduled 1,
> compaction slots scheduled 2
> 2024-10-01T11:46:30.889+0700 7f29c4512640  4 rocksdb:
> [db/flush_job.cc:335] [default] [JOB 5604] Flushing memtable with next log
> file: 280327
> 2024-10-01T11:46:30.889+0700 7f29c4512640  4 rocksdb: EVENT_LOG_v1
> {"time_micros": 1727757990893428, "job": 5604, "event": "flush_started",
> "num_memtables": 1, "num_entries": 2437269, "num_deletes": 2384624,
> "total_data_size": 233695787, "memory_usage": 278437952, "flush_reason":
> "Write Buffer Full"}
> 2024-10-01T11:46:30.889+0700 7f29c4512640  4 rocksdb:
> [db/flush_job.cc:364] [default] [JOB 5604] Level-0 flush table #280328:
> started
>
> Thank you
>
> ________________________________
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by copyright
> or other legal rules. If you have received it by mistake please let us know
> by reply email and delete it from your system. It is prohibited to copy
> this message or disclose its content to anyone. Any confidentiality or
> privilege is not waived or lost by any mistaken delivery or unauthorized
> disclosure of the message. All messages sent to and from Agoda may be
> monitored to ensure compliance with company policies, to protect the
> company's interests and to remove potential malware. Electronic messages
> may be intercepted, amended, lost or deleted, or contain viruses.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>

-- 

Sridhar Seshasayee

Partner Engineer

Red Hat <https://www.redhat.com>
<https://www.redhat.com>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx