As suggested by someone, I tried `dump_historic_slow_ops`. There aren't many, and they're somewhat difficult to interpret: "description": "osd_op(client.250533532.0:56821 13.16f 13:f6c9079e:::rbd_data.eed629ecc1f946.000000000000001c:head [stat,write 3518464~8192] snapc 0=[] ondisk+write+known_if_redirected e118835)", "initiated_at": "2023-04-26T07:00:58.299120+0000", "description": "osd_op(client.250533532.0:56822 13.16f 13:f6c9079e:::rbd_data.eed629ecc1f946.000000000000001c:head [stat,write 3559424~4096] snapc 0=[] ondisk+write+known_if_redirected e118835)", "initiated_at": "2023-04-26T07:00:58.299132+0000", "description": "osd_op(client.250533532.0:56823 13.16f 13:f6c9079e:::rbd_data.eed629ecc1f946.000000000000001c:head [stat,write 3682304~4096] snapc 0=[] ondisk+write+known_if_redirected e118835)", "initiated_at": "2023-04-26T07:00:58.299138+0000", "description": "osd_op(client.250533532.0:56824 13.16f 13:f6c9079e:::rbd_data.eed629ecc1f946.000000000000001c:head [stat,write 3772416~4096] snapc 0=[] ondisk+write+known_if_redirected e118835)", "initiated_at": "2023-04-26T07:00:58.299148+0000", "description": "osd_op(client.250533532.0:56825 13.16f 13:f6c9079e:::rbd_data.eed629ecc1f946.000000000000001c:head [stat,write 3796992~8192] snapc 0=[] ondisk+write+known_if_redirected e118835)", "initiated_at": "2023-04-26T07:00:58.299188+0000", "description": "osd_op(client.250533532.0:56826 13.16f 13:f6c9079e:::rbd_data.eed629ecc1f946.000000000000001c:head [stat,write 3862528~8192] snapc 0=[] ondisk+write+known_if_redirected e118835)", "initiated_at": "2023-04-26T07:00:58.299198+0000", "description": "osd_op(client.250533532.0:56827 13.16f 13:f6c9079e:::rbd_data.eed629ecc1f946.000000000000001c:head [stat,write 3899392~12288] snapc 0=[] ondisk+write+known_if_redirected e118835)", "initiated_at": "2023-04-26T07:00:58.299207+0000", "description": "osd_op(client.250533532.0:56828 13.16f 13:f6c9079e:::rbd_data.eed629ecc1f946.000000000000001c:head [stat,write 3944448~16384] snapc 0=[] ondisk+write+known_if_redirected e118835)", "initiated_at": "2023-04-26T07:00:58.299250+0000", "description": "osd_op(client.250533532.0:56829 13.16f 13:f6c9079e:::rbd_data.eed629ecc1f946.000000000000001c:head [stat,write 4018176~4096] snapc 0=[] ondisk+write+known_if_redirected e118835)", "initiated_at": "2023-04-26T07:00:58.299270+0000", There's a lot more information there ofc. I also tried to `dump_ops_in_flight` and there aren't many, usually 0-10 ops at a time, but the OSD latency remains high even when the ops count is low or zero. Any ideas? I would very much appreciate it if some could please point me to the documentation on interpreting the output of ops dump. /Z On Wed, 26 Apr 2023 at 20:22, Zakhar Kirpichenko <zakhar@xxxxxxxxx> wrote: > Hi, > > I have a Ceph 16.2.12 cluster with uniform hardware, same drive > make/model, etc. A particular OSD is showing higher latency than usual in > `ceph osd perf`, usually mid to high tens of milliseconds while other OSDs > show low single digits, although its drive's I/O stats don't look different > from those of other drives. The workload is mainly random 4K reads and > writes, the cluster is being used as Openstack VM storage. > > Is there a way to trace, which particular PG, pool and disk image or > object cause this OSD's excessive latency? Is there a way to tell Ceph to > > I would appreciate any advice or pointers. > > Best regards, > Zakhar > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx