Hello Ceph users, we see strange issue on last recent Ceph installation v17.6.2. We store data on HDD pool, index pool is on SSD. Each OSD store its wal on NVME partition. Benchmarks didn't expose any issues with cluster, but since we placed production load on it we see constantly growing OSD latency time (osd_read_latency) on SSD disks (where Index pool located). Latency is constantly growing day-by-day, but disks are not utilized even for 50%. Interesting, that when we move Index pool from SSD to NVME disks (disk space allows it for now) - osd latency drops to zero and start increasing from the ground. Also we noticed, that any change of pg_num for index pool (from 256 to 128 for instance) also drops latency to zero. And it starts its growth again (https://postimg.cc/5YHk9bby). >From client perspective it looks like one operation takes longer and longer each other day and operation time drops each time when we do some change on index pool. I've enabled debug_optracker 10/0 and it shows, that OSD spend most time in `queued_for_pg` state, but physical disk utilization is about 10-20%. Also per logs I see, that longest operation is Listbucket, but it is strange, that with less than 100'000 items in bucket list even with 'max_keys=1' takes 3-40 seconds. If it matters client is Apache Flink doing checkpoints via S3 protocol. Here is an example of operation with debug_optracking logs: 2023-12-29T16:24:28.873353+0300, event: throttled, op: osd_op(client.1227774 .0:22575820 7.19 7.a84f1a59 (undecoded) ondisk+write+known_if_redirected+supports_pool_eio e83833) 2023-12-29T16:24:28.861549+0300, event: header_read, op: osd_op(client. 1227774.0:22575820 7.19 7.a84f1a59 (undecoded) ondisk+write+known_if_redirected+supports_pool_eio e83833) 2023-12-29T16:24:28.873358+0300, event: all_read, op: osd_op(client.1227774 .0:22575820 7.19 7.a84f1a59 (undecoded) ondisk+write+known_if_redirected+supports_pool_eio e83833) 2023-12-29T16:24:28.873359+0300, event: dispatched, op: osd_op(client. 1227774.0:22575820 7.19 7.a84f1a59 (undecoded) ondisk+write+known_if_redirected+supports_pool_eio e83833) 2023-12-29T16:24:28.873389+0300, event: queued_for_pg, op: osd_op(client. 1227774.0:22575820 7.19 7.a84f1a59 (undecoded) ondisk+write+known_if_redirected+supports_pool_eio e83833) 2023-12-29T16:24:38.077528+0300, event: reached_pg, op: osd_op(client. 1227774.0:22575820 7.19 7.a84f1a59 (undecoded) ondisk+write+known_if_redirected+supports_pool_eio e83833) 2023-12-29T16:24:38.077561+0300, event: started, op: osd_op(client.1227774 .0:22575820 7.19 7:9a58f215:::.dir.68960da3-1c98-45c1-a87a-9e6c39253d27. 9780498.1.83:head [stat,call rgw.guard_bucket_resharding in=36b,call rgw.bucket_prepare_op in=331b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e83833) 2023-12-29T16:24:38.077714+0300, event: waiting for subops from 59,494, op: osd_op(client.1227774.0:22575820 7.19 7:9a58f215:::.dir.68960da3-1c98-45c1-a87a-9e6c39253d27.9780498.1.83:head [stat,call rgw.guard_bucket_resharding in=36b,call rgw.bucket_prepare_op in=331b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e83833) 2023-12-29T16:24:38.146157+0300, event: sub_op_commit_rec, op: osd_op(client.1227774.0:22575820 7.19 7:9a58f215:::.dir.68960da3-1c98-45c1-a87a-9e6c39253d27.9780498.1.83:head [stat,call rgw.guard_bucket_resharding in=36b,call rgw.bucket_prepare_op in=331b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e83833) 2023-12-29T16:24:38.146166+0300, event: op_commit, op: osd_op(client.1227774 .0:22575820 7.19 7:9a58f215:::.dir.68960da3-1c98-45c1-a87a-9e6c39253d27. 9780498.1.83:head [stat,call rgw.guard_bucket_resharding in=36b,call rgw.bucket_prepare_op in=331b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e83833) 2023-12-29T16:24:38.146191+0300, event: sub_op_commit_rec, op: osd_op(client.1227774.0:22575820 7.19 7:9a58f215:::.dir.68960da3-1c98-45c1-a87a-9e6c39253d27.9780498.1.83:head [stat,call rgw.guard_bucket_resharding in=36b,call rgw.bucket_prepare_op in=331b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e83833) 2023-12-29T16:24:38.146204+0300, event: commit_sent, op: osd_op(client. 1227774.0:22575820 7.19 7:9a58f215:::.dir.68960da3-1c98-45c1-a87a-9e6c39253d27.9780498.1.83:head [stat,call rgw.guard_bucket_resharding in=36b,call rgw.bucket_prepare_op in=331b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e83833) 2023-12-29T16:24:38.146216+0300, event: done, op: osd_op(client.1227774.0:22575820 7.19 7:9a58f215:::.dir.68960da3-1c98-45c1-a87a-9e6c39253d27.9780498.1.83:head [stat,call rgw.guard_bucket_resharding in=36b,call rgw.bucket_prepare_op in=331b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e83833) Does anybody faced with the same issue? I'd be very grateful for any idea because at this point I've got stuck with what to tune and where to look at. Cluster setup - replication 3x, 15 servers in 3 datacenters with datacenter as failure domain. 7x HDD (data), 2x SSD (index), 1x NVME (wal + OS). ceph config - https://pastebin.com/pCqxXhT3 OSD read latency graph - https://postimg.cc/5YHk9bby -- Thank you, Roman _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx