Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Wed, 24 Jun 2020 14:38:11 -0700

The benefit of disabling on-drive cache may be at least partly dependent on the HBA; I’ve done testing of one specific drive model and found no difference, where someone else reported a measurable difference for the same model.

> Good to know that we're not alone :) I also looked for a newer firmware, to no avail.

Dell sometimes publishes firmware blobs for drives that they resell, though those seem to have customized inquiry strings baked in, and their firmware won’t apply to “generic” drives without questionable hackery with a hex editor.  

My experience with Toshiba has been that the only way to get firmware blobs for generic drives is to persuade Toshiba themselves to give it to you, be it through a rep or the CSO.

> 
> Mark Nelson wrote:
>> This isn't the first time I've seen drive cache cause problematic
>> latency issues, and not always from the same manufacturer.
>> Unfortunately it seems like you really have to test the drives you
>> want to use before deploying them them to make sure you don't run into
>> issues.
> 
> That's very true! Data sheets and even public benchmarks can be quite
> deceiving, and two hard drives that seem to have similar performance profiles
> can perform very differently within a Ceph cluster. Lesson learned.

Benchmarks often are in a context rather removed from what anyone would deploy in production.

Notably I’ve had at least two experiences with drives that passed chassis vendor and in-house initial qualification.

The first was an HDD.  We had a mix of drives from vendor A and vendor B.  Found that Vendor B’s drives were throwing read errors at 30x the rate of Vendor A’s.  After persisting for months through the layers I was finally able to send drives to the vendor’s engineers, who found at least one design flaw that was tickled by the op pattern of a Filestore (XFS) OSD with colo journal.  Firmware was not able to substantially fix the problem, so they all had to be replaced with Vendor A.  Today BlueStore probably would not trigger the same design flaw.

The second was an SSD that was marketed as “enterprise” but had certain things that would only properly housekeep if allowed long idle times.  In that case I was eventually able to work with the vendor for a firmware fix.  In this case, BlueStore seemed to correlate with the behavior, as well as a serial number range.  This was one that didn’t manifest until drives had been in production for at least 90 days and as workload increased.

Moral of the story is to stress-test every model of drive if you care about data durability, availability, and performance.  Throw increasingly busy workloads and queue depths against the drives; performance of some will hit an abrupt cliff at a certain point.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx