Thanks Mark. With the help of the crowd on Telegram, we found that (at least here) the drive cache needs to be disabled like this: ``` for x in /sys/class/scsi_disk/*/cache_type; do echo 'write through' > $x; done ``` This disables the cache (confirmed afterwards with hdparm) but more importantly fio --fsync=1 --direct=1 is now giving 400 iops (as opposed to 80iops, ootb). And commit_latency_ms drops from 80ms to 3ms :-) We'll do more testing here but this looks like a magic switch. (we should consider documenting or automating this to some extent, imho). Cheers, Dan On Thu, Nov 4, 2021 at 11:48 AM Mark Nelson <mnelson@xxxxxxxxxx> wrote: > > Hi Dan, > > > I can't speak for those specific Toshiba drives, but we have absolutely > seen very strange behavior (sometimes with cache enabled and sometimes > not) with different drives and firmwares over the years from various > manufacturers. There was one especially bad case from back in the > Inktank days, but my memory is a bit fuzzy. I think we were seeing > weird periodic commit latency spikes that grew worse over time. That > one might have been cache related. I believe we ended up doing a lot of > tests with blktrace and iowatcher to show the manufacturer what we were > seeing, but I don't recall if anything ever got fixed. > > > Mark > > > On 11/4/21 5:33 AM, Dan van der Ster wrote: > > Hello Benoît, (and others in this great thread), > > > > Apologies for replying to this ancient thread. > > > > We have been debugging similar issues during an ongoing migration to > > new servers with TOSHIBA MG07ACA14TE hdds. > > > > We see a similar commit_latency_ms issue on the new drives (~60ms in > > our env vs ~20ms for some old 6TB Seagates). > > However, disabling the write cache (hdparm -W 0) made absolutely no > > difference for us. > > > > So we're wondering: > > * Are we running the same firmware as you? (We have 0104). I wonder if > > Toshiba has changed the implementation of the cache in the meantime... > > * Is anyone aware of some HBA or other setting in the middle that > > might be masking this setting from reaching the drive? > > > > Best Regards, > > > > Dan > > > > > > > > On Wed, Jun 24, 2020 at 9:44 AM Benoît Knecht <bknecht@xxxxxxxxxxxxx> wrote: > >> Hi, > >> > >> We have a Nautilus (14.2.9) Ceph cluster with two types of HDDs: > >> > >> - TOSHIBA MG07ACA14TE [1] > >> - HGST HUH721212ALE604 [2] > >> > >> They're all bluestore OSDs with no separate DB+WAL and part of the same pool. > >> > >> We noticed that while the HGST OSDs have a commit latency of about 15ms, the Toshiba OSDs hover around 150ms (these values come from the `ceph_osd_commit_latency_ms` metric in Prometheus). > >> > >> On paper, it seems like those drives have very similar specs, so it's not clear to me why we're seeing such a large difference when it comes to commit latency. > >> > >> Has anyone had any experience with those Toshiba drives? Or looking at the specs, do you spot anything suspicious? > >> > >> And if you're running a Ceph cluster with various disk brands/models, have you ever noticed some of them standing out when looking at `ceph_osd_commit_latency_ms`? > >> > >> Thanks in advance for your feedback. > >> > >> Cheers, > >> > >> -- > >> Ben > >> > >> [1]: https://toshiba.semicon-storage.com/content/dam/toshiba-ss/asia-pacific/docs/product/storage/product-manual/eHDD-MG07ACA-Product-Manual.pdf > >> [2]: https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-dc-hc500-series/data-sheet-ultrastar-dc-hc520.pdf > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx