Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Mark.

With the help of the crowd on Telegram, we found that (at least here)
the drive cache needs to be disabled like this:

```
for x in /sys/class/scsi_disk/*/cache_type; do echo 'write through' > $x; done
```

This disables the cache (confirmed afterwards with hdparm) but more
importantly fio --fsync=1 --direct=1 is now giving 400 iops (as
opposed to 80iops, ootb).

And commit_latency_ms drops from 80ms to 3ms :-)

We'll do more testing here but this looks like a magic switch. (we
should consider documenting or automating this to some extent, imho).

Cheers, Dan

On Thu, Nov 4, 2021 at 11:48 AM Mark Nelson <mnelson@xxxxxxxxxx> wrote:
>
> Hi Dan,
>
>
> I can't speak for those specific Toshiba drives, but we have absolutely
> seen very strange behavior (sometimes with cache enabled and sometimes
> not) with different drives and firmwares over the years from various
> manufacturers.  There was one especially bad case from back in the
> Inktank days, but my memory is a bit fuzzy.  I think we were seeing
> weird periodic commit latency spikes that grew worse over time.  That
> one might have been cache related.  I believe we ended up doing a lot of
> tests with blktrace and iowatcher to show the manufacturer what we were
> seeing, but I don't recall if anything ever got fixed.
>
>
> Mark
>
>
> On 11/4/21 5:33 AM, Dan van der Ster wrote:
> > Hello Benoît, (and others in this great thread),
> >
> > Apologies for replying to this ancient thread.
> >
> > We have been debugging similar issues during an ongoing migration to
> > new servers with TOSHIBA MG07ACA14TE hdds.
> >
> > We see a similar commit_latency_ms issue on the new drives (~60ms in
> > our env vs ~20ms for some old 6TB Seagates).
> > However, disabling the write cache (hdparm -W 0) made absolutely no
> > difference for us.
> >
> > So we're wondering:
> > * Are we running the same firmware as you? (We have 0104). I wonder if
> > Toshiba has changed the implementation of the cache in the meantime...
> > * Is anyone aware of some HBA or other setting in the middle that
> > might be masking this setting from reaching the drive?
> >
> > Best Regards,
> >
> > Dan
> >
> >
> >
> > On Wed, Jun 24, 2020 at 9:44 AM Benoît Knecht <bknecht@xxxxxxxxxxxxx> wrote:
> >> Hi,
> >>
> >> We have a Nautilus (14.2.9) Ceph cluster with two types of HDDs:
> >>
> >> - TOSHIBA MG07ACA14TE   [1]
> >> - HGST HUH721212ALE604  [2]
> >>
> >> They're all bluestore OSDs with no separate DB+WAL and part of the same pool.
> >>
> >> We noticed that while the HGST OSDs have a commit latency of about 15ms, the Toshiba OSDs hover around 150ms (these values come from the `ceph_osd_commit_latency_ms` metric in Prometheus).
> >>
> >> On paper, it seems like those drives have very similar specs, so it's not clear to me why we're seeing such a large difference when it comes to commit latency.
> >>
> >> Has anyone had any experience with those Toshiba drives? Or looking at the specs, do you spot anything suspicious?
> >>
> >> And if you're running a Ceph cluster with various disk brands/models, have you ever noticed some of them standing out when looking at `ceph_osd_commit_latency_ms`?
> >>
> >> Thanks in advance for your feedback.
> >>
> >> Cheers,
> >>
> >> --
> >> Ben
> >>
> >> [1]: https://toshiba.semicon-storage.com/content/dam/toshiba-ss/asia-pacific/docs/product/storage/product-manual/eHDD-MG07ACA-Product-Manual.pdf
> >> [2]: https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-dc-hc500-series/data-sheet-ultrastar-dc-hc520.pdf
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux