On 2021/08/06 17:36, Hannes Reinecke wrote: > On 8/6/21 6:05 AM, Damien Le Moal wrote: >> On 2021/08/06 12:42, Martin K. Petersen wrote: >>> >>> Damien, >>> >>>> Single LUN multi-actuator hard-disks are cappable to seek and execute >>>> multiple commands in parallel. This capability is exposed to the host >>>> using the Concurrent Positioning Ranges VPD page (SCSI) and Log (ATA). >>>> Each positioning range describes the contiguous set of LBAs that an >>>> actuator serves. >>> >>> I have to say that I prefer the multi-LUN model. >> >> It is certainly easier: nothing to do :) >> SATA, as usual, makes things harder... >> >>> >>>> The first patch adds the block layer plumbing to expose concurrent >>>> sector ranges of the device through sysfs as a sub-directory of the >>>> device sysfs queue directory. >>> >>> So how do you envision this range reporting should work when putting >>> DM/MD on top of a multi-actuator disk? >> >> The ranges are attached to the device request queue. So the DM/MD target driver >> can use that information from the underlying devices for whatever possible >> optimization. For the logical device exposed by the target driver, the ranges >> are not limits so they are not inherited. As is, right now, DM target devices >> will not show any range information for the logical devices they create, even if >> the underlying devices have multiple ranges. >> >> The DM/MD target driver is free to set any range information pertinent to the >> target. E.g. dm-liear could set the range information corresponding to sector >> chunks from different devices used to build the dm-linear device. >> > And indeed, that would be the easiest consumer. > One 'just' needs to have a simple script converting the sysfs ranges > into the corresponding dm-linear table definitions, and create one DM > device for each range. > That would simulate the multi-LUN approach. > Not sure if that would warrant a 'real' DM target, seeing that it's > fully scriptable. > >>> And even without multi-actuator drives, how would you express concurrent >>> ranges on a DM/MD device sitting on top of a several single-actuator >>> devices? >> >> Similar comment as above: it is up to the DM/MD target driver to decide if range >> information can be useful. For dm-linear, there are obvious cases where it is. >> Ex: 2 single actuator drives concatenated together can generate 2 ranges >> similarly to a real split-actuator disk. Expressing the chunks of a dm-linear >> setup as ranges may not always be possible though, that is, if we keep the >> assumption that a range is independent from others in terms of command >> execution. Ex: a dm-linear setup that shuffles a drive LBA mapping (high to low >> and low to high) has no business showing sector ranges. >> >>> While I appreciate that it is easy to just export what the hardware >>> reports in sysfs, I also think we should consider how filesystems would >>> use that information. And how things would work outside of the simple >>> fs-on-top-of-multi-actuator-drive case. >> >> Without any change anywhere in existing code (kernel and applications using raw >> disk accesses), things will just work as is. The multi/split actuator drive will >> behave as a single actuator drive, even for commands spanning range boundaries. >> Your guess on potential IOPS gains is as good as mine in this case. Performance >> will totally depend on the workload but will not be worse than an equivalent >> single actuator disk. >> >> FS block allocators can definitely use the range information to distribute >> writes among actuators. For reads, well, gains will depend on the workload, >> obviously, but optimizations at the block IO scheduler level can improve things >> too, especially if the drive is being used at a QD beyond its capability (that >> is, requests are accumulated in the IO scheduler). >> >> Similar write optimization can be achieved by applications using block device >> files directly. This series is intended for this case for now. FS and bloc IO >> scheduler optimization can be added later. >> >> > Rumours have it that Paolo Valente is working on adapting BFQ to utilize > the range information for better actuator utilisation. Paolo has a talk on this subject scheduled for SNIA SDC 2021. https://storagedeveloper.org/events/sdc-2021/abstracts#hd-Walker > And eventually one should modify filesystem utilities like xfs to adapt > the metadata layout to multi-actuator drives. > > The _real_ fun starts once the HDD manufactures starts putting out > multi-actuator SMR drives :-) Well, that does not change things that much in the end. The same constraints remain, and the sector ranges will be aligned to zones. So no added difficulty. > > Cheers, > > Hannes > -- Damien Le Moal Western Digital Research