[Public] > -----Original Message----- > From: Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> > Sent: Wednesday, May 11, 2022 03:15 > To: Runa Guo-oc <RunaGuo-oc@xxxxxxxxxxx>; linux-ide@xxxxxxxxxxxxxxx; > Limonciello, Mario <Mario.Limonciello@xxxxxxx>; Cobe Chen > <CobeChen@xxxxxxxxxxx>; Tim Guo <TimGuo@xxxxxxxxxxx>; TonyW > Wang <TonyWWang@xxxxxxxxxxx>; Leo Liu <LeoLiu@xxxxxxxxxxx> > Subject: Re: [PATCH 1/2] ahci: Add PhyRdy Change control on actual LPM > capability > > On 2022/05/07 9:36, Runa Guo-oc wrote: > > On 2022/5/2 21:05, Damien Le Moal wrote: > >> On 2022/04/27 19:18, Runa Guo-oc wrote: > >>> On 2022/4/23 6:37, Damien Le Moal wrote: > >>>> On 4/22/22 18:57, RunaGuo-oc wrote: > >>>>> On 2022/4/21 18:39, Damien Le Moal wrote: > >>>>>> On 4/21/22 18:43, Runa Guo-oc wrote: > >>>>>>> On some platform, when OS enables LPM by default (eg, > min_power), > >>>>>>> then, PhyRdy Change cannot be detected if ahci supports no LPM. > >>>>>> Do you mean "...if the ahci adapter does not support LPM." ? > >>>>> Yes. > >>>>> > >>>>>> If that is what you mean, then min_power should not be set. > >>>>> Yes, I agree with you. But, as we know, link_power_management > >>>>> is a user policy which can be modified, if some users are not > >>>>> familiar with ahci spec, then the above case may happen. > >>>> Users should *never* need to be aware of the HW specs and what can > or > >>>> cannot be done with a particular adapter/drive. User actions trying to > >>>> enable an unsupported feature should be failed, always. > >>>> > >>>> In your case though, quickly checking the AHCI specs, the scontrol > >>>> register bits you change seem to be for preventing *device* initiated > >>>> power mode transitions, not user/host initiated operations. Your > commit > >>>> message should clearly mention that. But I still need to spend more > time > >>>> re-reading the specs to confirm. Will do that next week. > >>>> > >>>> Since you added the CAP flags, these flags should be used to detect > low > >>>> power policies that can be allowed for user actions. > >>>> > >>>> Mario, > >>>> > >>>> Please rebase and repost your patches ! I rebased the for-5.19 branch > on > >>>> rc3 to have the LPM config name revert. We need to fix this LPM mess > :) > >>>> > >>>>>> Mario has patches to fix that. > >>>>> > >>>>> Hmm. How to patch this case ? > >>>> Mario's patches modify the beginning of the sata_link_scr_lpm() > function > >>>> to prevent setting an LPM policy that the adapter/drive does not > support. > >>>> This together with the correct bits set/reset in the scr register will > >>>> only allow LPM transitions that are supported. > >>>> > >>>> It may also be good to revisit ata_scsi_lpm_store() to prevent the user > >>>> from setting an unsupported policy. Currently, that uselessly triggers an > >>>> EH sequence. > >>> To avoid some confusion in this patch set, I want to explain it here. > >>> The patch set involves two LPM related issues, one for the ahci adapter > >>> does not support LPM (no partial & slumber & devslp), the other for > >>> ahci adapter supports part of LPM(eg, only partial, no slumber & devslp). > >>> > >>> The former case is what I metioned in this mail thread. Namely, when > LPM is > >>> enabled, actions trying to enable this unsupported feature will be failed, > >>> but will disable PORT_IRQ_PHYRDY bit at the beginning of the > ahci_set_lpm() > >>> function, which would make PhyRdy Changed cannot be detected. So I > add flags > >>> in the ata_eh_set_lpm() function which will not go to the disable > operation. > >>> > >>> The latter case is what I metioned in "PATCH[2/2]". Namely, if the ahci > >>> adapter only supports partial (no slumber & devslp), then when LPM is > enabled > >>> (eg, min_power), *device* initiated power mode transitions will be > enabled > >>> with the scontrol register bits setting to indicate no restrictions on LPM > >>> transitions. After that, if SSD/HDD sends a DIPM slumber request, it > cannot > >>> be disallowed by ahci adpter for driver not setting scontrol register bits > >>> properly. So I add flags to control it. > >>> > >>> Therefore, Mario's patches in the sata_link_scr_lpm() function may fix > the > >>> issue in "PATCH[2/2]". > >>> > >>> Revisit ata_scsi_lpm_store() to prevent the user from setting an > unsupported > >>> policy may be a way to fix the issue in "PATCH[1/2]", but there may be > another > >>> case where some operating system manufacturers setting LPM default > enable in > >>> driver, like the following code in the ahci_init_one() function, also need > to > >>> control. > >>> > >>> if (ap->flags & ATA_FLAG_EM) > >>> ap->em_message_type = hpriv->em_msg_type; > >>> > >>> + ap->target_lpm_policy = ATA_LPM_MIN_POWER; > >> This one looks wrong. This is set inside ahci_update_initial_lpm_policy() > >> according to the default kernel config > (CONFIG_SATA_MOBILE_LPM_POLICY) and > >> module param + what the drive can do according to ACPI. The problem > though is > >> that the adapter capabilities are not checked in that function, so the initial > >> target lpm policy may be wrong. > >> > >> Since your patch 1/2 adds the hpriv flags indicating the capabilities, you > need > >> to use these in ahci_update_initial_lpm_policy() to validate whatever > initial > >> policy is asked for by the user. > > > > Yes, the above code is not rigorous, existing methods provided by kernel as > you > > said should be used in this case. > > > > In order to use CAP flags to validate user policy, I review the latest kernel > > LPM policies, here is my understanding: > > ATA_LPM_UNKNOWN: default policy, no LPM > > ATA_LPM_MAX_POWER: disable LPM (hipm & dipm) > > ATA_LPM_MED_POWER: enable hipm partial > > ATA_LPM_MED_POWER_WITH_DIPM: enable hipm partial &dipm partial > &slumber > > ATA_LPM_MIN_POWER_WITH_PARTIAL: enable hipm partial &dipm > partial&slumber&devslp > > ATA_LPM_MIN_POWER: enable hipm slumber &dipm partial &slumber > &devslp > > hipm: adpter initiated power mode transitions; > > dipm:*device* initiated power mode transitions; > > > > From my comprehension, user policy in [ATA_LPM_MED_POWER, > ATA_LPM_MIN_POWER] > > need to be validated on adapter's capabilities (partial(y/n), slumber(y/n), > > devslp(y/n)), so, there exits the following cases: > > Note that devslp is a device side feature too. See ata_dev_config_devslp() in > libata-core.c. So even if the adapter supports devslp, if the drive does not, > devslp should not be enabled on the port. > > > 1, (n, n, n), validate it to ATA_LPM_UNKNOWN; > > 2, (n, n, y), validate it > to ATA_LPM_MIN_POWER_WITH_PARTIAL/ATA_LPM_MIN_POWER? > > 3, (n, y, n), validate it to ATA_LPM_MIN_POWER; > > 4, (n, y, y), validate it to ATA_LPM_MIN_POWER; > > 5, (y, n, n), validate it to ATA_LPM_MED_POWER; > > 6, (y, n, y), validate it to ATA_LPM_MIN_POWER_WITH_PARTIAL; > > 7, (y, y, n), default user policy; > > 8, (y, y, y), default user policy; > > ('y' for support, 'n' for not support) > > > > for case 2, I'm not quiet sure, for which may enable hipm partial/slumber. > > For all above cases, the default should be to use the default configured policy > defined by SATA_MOBILE_LPM_POLICY or the ahci module parameter but > corrected to > match what the adapter & device support, including the eventual NO LPM > horkage > flag. Mario's patch started addressing that, but that patch can be improved > using yours. > > > The way I provided above is quiet complicated and may be incomplete. > > It may not be realistic to take all into account, but I think case 1 should > > be taken seriously for which may cause the above PORT_IRQ_PHYRDY > issue. > > yes. > > > Perhaps, I could refer to Mario's patches later (I have not found yet on > > kernel/git ^_^). > > Mario's patch is here: > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore. > kernel.org%2Fall%2F20220404194510.9206-2- > mario.limonciello%40amd.com%2F&data=05%7C01%7Cmario.limonciell > o%40amd.com%7C2ff34a9405444559699208da33264e52%7C3dd8961fe4884e > 608e11a82d994e183d%7C0%7C0%7C637878536864920264%7CUnknown%7CT > WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLC > JXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=izsUMCw%2BOzV7myxDH > JBqmWnzInVJ2CJz3wfsdgVmDdY%3D&reserved=0 > > Can you add that patch into your series with eventual modifications to better > check the adapter's CAP bits ? > > Mario ? Are you OK with that ? I think that's fine. I've obviously gotten side tracked and didn't get a chance to rebase and send them up. but I do want to bring to your attention that I just found out there was a regression from the original patch that set the policy to mobile. I was hoping to figure out whether it's just a stable artifact or if it's a real problem with mainline kernel before giving to much thought to setting ALL boards to this new policy. It might be a single outlier, or it might be a crystal ball - TBD. This regression happened because it got brought back to 5.4-stable, and as it turns out the exact same FCH controller ID from the client silicon is used in another product. It's an ASUS server system with AMD Epyc processor. The regression is specifically along hotplug, that hotplug no longer works with the more aggressive policies. The regression (including the bisect) is reported here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1971576 There is active open tests to figure out whether the issue happens in mainline kernel as well or if it's just from 5.4 backport. My suggestion is that you pull my patches into your series and lets keep iterating Them, but let's hold off pulling into 5.19 until that's figured out, especially if we have to revert 380cd49e207ba4 to solve that problem then we shouldn't push this policy across all boards (unfortunately). > > > -- > Damien Le Moal > Western Digital Research