Re: [PATCH v5 0/5] Initial support for multi-actuator HDDs

Damien Le Moal <Damien.LeMoal@xxxxxxx> · Tue, 17 Aug 2021 04:06:16 +0000

On 2021/08/17 2:10, Martin K. Petersen wrote:
> 
> Hi Damien!
> 
>> Single LUN multi-actuator hard-disks are cappable to seek and execute
>> multiple commands in parallel. This capability is exposed to the host
>> using the Concurrent Positioning Ranges VPD page (SCSI) and Log (ATA).
>> Each positioning range describes the contiguous set of LBAs that an
>> actuator serves.
> 
> I am not a big fan of the Concurrent Positioning Range terminology since
> it is very specific to the implementation of multi-actuator disk drives.
> With other types of media, "positioning" doesn't any sense. It is
> unfortunate that CPR is the term that ended up in a spec that covers a
> wide variety of devices and media types.
> 
> I also think that the "concurrent positioning" emphasizes the
> performance aspect but not so much the fault domain which in many ways
> is the more interesting part.
> 
> The purpose of exposing this information to the filesystems must be to
> encourage them to use it. And therefore I think it is important that the
> semantics and information conveyed is applicable outside of the
> multi-actuator use case. It would be easy to expose this kind of
> information for concatenated LVM devices, etc.
> 
> Anyway. I don't really have any objections to the series from an
> implementation perspective. I do think "cpr" as you used in patch #2 is
> a better name than "crange". But again, I wish we could come up with a
> more accurate and less disk actuator centric terminology for the block
> layer plumbing.
> 
> I would have voted for "fault_domain" but that ignores the performance
> aspect. "independent_block_range", maybe? Why is naming always so hard?
> :(

I did struggle with the naming too and crange was the best I could come up with
given the specs wording.

With the single LUN approach, the fault domain does not really change from a
regular device. The typical use in case of bad heads would be to replace the
drive or reformat it at lower capacity with head depop. That could be avoided
with dm-linear on top (one DM per actuator) though.

As for the independent_block_range idea, I thought of that too, but the problem
is that the tags are still shared between the 2 actuators, so the ranges are not
really independent at all. One can starve the other depending on the host
workload without FS and/or IO scheduler optimization distributing the commands
between the actuators.

The above point led me to this informational only implementation. Without
optimization, we get the same as today. No changes in performance and use.
Better IOPS is gain for lucky workloads (typically random ones). Going forward,
more reliable IOPS & throughput gains are possible with some additional changes.

-- 
Damien Le Moal
Western Digital Research