Eric, > Even new new RAID controlers that _do_ provide `io_opt` still do _not_ > indicate partial_stripes_expensive (which is an mdraid feature, but Martin > please correct me if I'm wrong here). partial_stripes_expensive is a bcache thing, I am not sure why it needs a separate flag. It is implied, although I guess one could argue that RAID0 is a special case since partial writes are not as painful as with parity RAID. The SCSI spec states that submitting an I/O that is smaller than io_min "may incur delays in processing the command". And similarly, submitting a command larger than io_opt "may incur delays in processing the command". IOW, the spec says "don't write less than an aligned multiple of the stripe chunk size" and "don't write more than an aligned full stripe". That leaves "aligned multiples of the stripe chunk size but less than the full stripe width" unaccounted for. And I guess that's what the bcache flag is trying to capture. SCSI doesn't go into details about RAID levels and other implementation details which is why the wording is deliberately vague. But obviously the expectation is that partial stripe writes are slower than full. In my book any component in the stack that sees either io_min or io_opt should try very hard to send I/Os that are aligned multiples of those values. I am not opposed to letting users manually twiddle the settings. But I do think that we should aim for the stack doing the right thing when it sees io_opt reported on a device. -- Martin K. Petersen Oracle Linux Engineering