Re: Ceph 16.2.12, particular OSD shows higher latency than others

Zakhar Kirpichenko <zakhar@xxxxxxxxx> · Fri, 28 Apr 2023 03:07:58 +0300

Thanks, Igor. I mentioned earlier that according to the OSD logs compaction
wasn't an issue. I did run `ceph-kvstore-tool` offline though, it completed
rather quickly without any warnings or errors, but the OSD kept showing
excessive latency.

I did something rather radical: rebooted the node and redeployed all its
OSDs, now the "slow" OSD is showing latency more in line with that of other
OSDs.

/Z

On Thu, 27 Apr 2023 at 23:10, Igor Fedotov <igor.fedotov@xxxxxxxx> wrote:

> Hi Zakhar,
>
> you might want to try offline DB compaction using ceph-kvstore-tool for
> this specific OSD.
>
> Periodically we observe OSD perf drop due to degraded RocksDB
> performance, particularly after bulk data removal/migration.. Compaction
> is quite helpful in this case.
>
>
> Thanks,
>
> Igor
>
>
>
> On 26/04/2023 20:22, Zakhar Kirpichenko wrote:
> > Hi,
> >
> > I have a Ceph 16.2.12 cluster with uniform hardware, same drive
> make/model,
> > etc. A particular OSD is showing higher latency than usual in `ceph osd
> > perf`, usually mid to high tens of milliseconds while other OSDs show low
> > single digits, although its drive's I/O stats don't look different from
> > those of other drives. The workload is mainly random 4K reads and writes,
> > the cluster is being used as Openstack VM storage.
> >
> > Is there a way to trace, which particular PG, pool and disk image or
> object
> > cause this OSD's excessive latency? Is there a way to tell Ceph to
> >
> > I would appreciate any advice or pointers.
> >
> > Best regards,
> > Zakhar
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx