Re: 16.2.10: ceph osd perf always shows high latency for a specific OSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for this!

The drive doesn't show increased utilization on average, but it does
sporadically get more I/O than other drives, usually in short bursts. I am
now trying to find a way to trace this to a specific PG, pool and object
(s) – not sure if that is possible.

/Z

On Fri, 7 Oct 2022, 12:17 Dan van der Ster, <dvanders@xxxxxxxxx> wrote:

> Hi Zakhar,
>
> I can back up what Konstantin has reported -- we occasionally have
> HDDs performing very slowly even though all smart tests come back
> clean. Besides ceph osd perf showing a high latency, you could see
> high ioutil% with iostat.
>
> We normally replace those HDDs -- usually by draining and zeroing
> them, then putting them back in prod (e.g. in a different cluster or
> some other service). I don't have statistics on how often those sick
> drives come back to full performance or not -- that could indicate it
> was a poor physical connection, vibrations, ... , for example. But I
> do recall some drives came back repeatedly as "sick" but not dead w/
> clean SMART tests.
>
> If you have time you can dig deeper with increased bluestore debug
> levels. In our environment this happens often enough that we simply
> drain, replace, move on.
>
> Cheers, dan
>
>
>
>
> On Fri, Oct 7, 2022 at 9:41 AM Zakhar Kirpichenko <zakhar@xxxxxxxxx>
> wrote:
> >
> > Unfortunately, that isn't the case: the drive is perfectly healthy and,
> > according to all measurements I did on the host itself, it isn't any
> > different from any other drive on that host size-, health- or
> > performance-wise.
> >
> > The only difference I noticed is that this drive sporadically does more
> I/O
> > than other drives for a split second, probably due to specific PGs placed
> > on its OSD, but the average I/O pattern is very similar to other drives
> and
> > OSDs, so it's somewhat unclear why the specific OSD is consistently
> showing
> > much higher latency. It would be good to figure out what exactly is
> causing
> > these I/O spikes, but I'm not yet sure how to do that.
> >
> > /Z
> >
> > On Fri, 7 Oct 2022 at 09:24, Konstantin Shalygin <k0ste@xxxxxxxx> wrote:
> >
> > > Hi,
> > >
> > > When you see one of 100 drives perf is unusually different, this may
> mean
> > > 'this drive is not like the others' and should be replaced
> > >
> > >
> > > k
> > >
> > > Sent from my iPhone
> > >
> > > > On 7 Oct 2022, at 07:33, Zakhar Kirpichenko <zakhar@xxxxxxxxx>
> wrote:
> > > >
> > > > Anyone, please?
> > >
> > >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux