Re: [PATCH] ata: Add sdev attribute to lower link speed in runtime

Niklas Cassel <cassel@xxxxxxxxxx> · Wed, 17 Apr 2024 12:14:25 +0200

On Mon, Apr 15, 2024 at 04:49:46PM +0200, Gustav Ekelund wrote:
> On 4/13/24 02:29, Damien Le Moal wrote:
> > On 4/12/24 22:48, Gustav Ekelund wrote:
> >> Expose a new sysfs attribute to userspace that gives root the ability to
> >> lower the link speed in a scsi_device at runtime. The handle enables
> >> programs to, based on external circumstances that may be unbeknownst to
> >> the kernel, determine if a link should slow down to perhaps achieve a
> >> stabler signal. External circumstances could include the mission time
> >> of the connected hardware or observations to temperature trends.
> > 
> > may, perhaps, could... This does not sound very deterministic. Do you have an
> > actual practical use case where this patch is useful and solve a real problem ?
> > 
> > Strictly speaking, if you are seeing link stability issues due to temperature or
> > other environmental factors (humidity, altitude), then either you are operating
> > your hardware (board and/or HDD) outside of their environmental specifications,
> > or you have some serious hardware issues (which can be a simple as a bad SATA
> > cable or an inappropriate power supply). In both cases, I do not think that this
> > patch will be of any help.
> > 
> > Furthermore, libata already lowers a link speed automatically at runtime if it
> > sees too many NCQ errors. Isn't that enough ? And we also have the horkage flags
> > to force a maximum link speed for a device/adapter, which can also be specified
> > as a libata module argument (libata.force).
> > 
> >> Writing 1 to /sys/block/*/device/down_link_spd signals the kernel to
> >> first lower the link speed one step with sata_down_spd_limit and then
> >> finish off with sata_link_hardreset.
> > 
> > We already have "/sys/class/ata_link/*/hw_sata_spd_limit", which is read-only
> > for now. So if you can really justify this manual link speed tuning for an
> > actual use case (not a hypothetical one), then the way to go would be to make
> > that attribute RW and implement its store() method to lower the link speed at
> > runtime.
> > 
> > And by the way, looking at what that attribute says, I always get:
> > <unknown>
> > 
> > So it looks like there is an issue with it that went unnoticed (because no one
> > is using it...). This needs some fixing.
> > 
> Hello Damien and Niklas,
> 
> Thank you for the feedback.
> 
> I have a hotplug system, where the links behave differently depending
> on the disk model connected. For some models the kernel emits a lot of
> bus errors, but mostly not enough errors for it to automatically lower
> the link speed, except during high workloads. I have not observed any
> data-loss regarding the errors, but the excessive logging becomes a problem.

It might be interesting to compare the output of:
$ hdparm -I

for a drive that you can hot plug insert without errors, against a drive
that gives you errors on hot plug insertion, to see if this can give you
a hint of why they behave differently.

(e.g. certain features, e.g. DevSleep, is only enabled if there is support
in the HBA, the port, and the drive.)

Kind regards,
Niklas