On 4/17/24 12:14, Niklas Cassel wrote: > On Mon, Apr 15, 2024 at 04:49:46PM +0200, Gustav Ekelund wrote: >> On 4/13/24 02:29, Damien Le Moal wrote: >>> On 4/12/24 22:48, Gustav Ekelund wrote: >>>> Expose a new sysfs attribute to userspace that gives root the ability to >>>> lower the link speed in a scsi_device at runtime. The handle enables >>>> programs to, based on external circumstances that may be unbeknownst to >>>> the kernel, determine if a link should slow down to perhaps achieve a >>>> stabler signal. External circumstances could include the mission time >>>> of the connected hardware or observations to temperature trends. >>> >>> may, perhaps, could... This does not sound very deterministic. Do you have an >>> actual practical use case where this patch is useful and solve a real problem ? >>> >>> Strictly speaking, if you are seeing link stability issues due to temperature or >>> other environmental factors (humidity, altitude), then either you are operating >>> your hardware (board and/or HDD) outside of their environmental specifications, >>> or you have some serious hardware issues (which can be a simple as a bad SATA >>> cable or an inappropriate power supply). In both cases, I do not think that this >>> patch will be of any help. >>> >>> Furthermore, libata already lowers a link speed automatically at runtime if it >>> sees too many NCQ errors. Isn't that enough ? And we also have the horkage flags >>> to force a maximum link speed for a device/adapter, which can also be specified >>> as a libata module argument (libata.force). >>> >>>> Writing 1 to /sys/block/*/device/down_link_spd signals the kernel to >>>> first lower the link speed one step with sata_down_spd_limit and then >>>> finish off with sata_link_hardreset. >>> >>> We already have "/sys/class/ata_link/*/hw_sata_spd_limit", which is read-only >>> for now. So if you can really justify this manual link speed tuning for an >>> actual use case (not a hypothetical one), then the way to go would be to make >>> that attribute RW and implement its store() method to lower the link speed at >>> runtime. >>> >>> And by the way, looking at what that attribute says, I always get: >>> <unknown> >>> >>> So it looks like there is an issue with it that went unnoticed (because no one >>> is using it...). This needs some fixing. >>> >> Hello Damien and Niklas, >> >> Thank you for the feedback. >> >> I have a hotplug system, where the links behave differently depending >> on the disk model connected. For some models the kernel emits a lot of >> bus errors, but mostly not enough errors for it to automatically lower >> the link speed, except during high workloads. I have not observed any >> data-loss regarding the errors, but the excessive logging becomes a problem. > > It might be interesting to compare the output of: > $ hdparm -I > > for a drive that you can hot plug insert without errors, against a drive > that gives you errors on hot plug insertion, to see if this can give you > a hint of why they behave differently. > > (e.g. certain features, e.g. DevSleep, is only enabled if there is support > in the HBA, the port, and the drive.) > > > Kind regards, > Niklas Hi Niklas, I mostly tested on the 6.1 kernel, and it is a quite peculiar hardware problem, so it isn't caused by anythin in the latest 6.9 rc. Thank you for the advice of using hdparm, maybe I can diff preset of features between the models. I will share any interesting findings I come across. Best regards Gustav