Re: [PATCH] PCI/ASPM: fix unexpected behavior when re-enabling L1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02.10.2023 17:14, Bjorn Helgaas wrote:
> [+cc Sathy, Lukas]
> 
> On Sat, Aug 26, 2023 at 01:10:35PM +0200, Heiner Kallweit wrote:
>> After the referenced commit we may see L1 sub-states being active
>> unexpectedly. Following scenario as an example:
>> r8169 disables L1 because of known hardware issues on a number of
>> systems. Implicitly L1.1 and L1.2 are disabled too.
>> On my system L1 and L1.1 work fine, but L1.2 causes missed
>> rx packets. Therefore I write 1 to aspm_l1_1.
>> This removes ASPM_STATE_L1 from the disabled modes and therefore
>> unexpectedly enables also L1.2. So return to the old behavior.
>>
>> A comment in the commit message of the referenced change correctly points
>> out that this behavior is inconsistent with aspm_attr_store_common().
>> So change aspm_attr_store_common() accordingly.
> 
> I think we should split this into a pure revert of fb097dcd5a28 with
> the description of the unintended consequence, followed by another
> patch to change aspm_attr_store_common().
> 
OK

> I guess the existing aspm_attr_store_common() behavior allows a
> similar unexpected behavior?  For example, in this sequence:
> 
>   - Write 0 to "l1_aspm" to disable L1
>   - Write 1 to "l1_1_aspm" to enable L1.1
> 
> does L1.2 get implicitly enabled as well even though that's clearly
> not what the user intended?
> 
Right, it's the same here. Therefore the second change in the patch.

> There's also the separate question of how the sysfs file and the
> pci_disable_link_state() API should interact.  Drivers use that API
> when they know about a defect in their device, but the user can
> override that via syfs.
> 
> In [1], we have a similar situation with D3cold support, where we're
> thinking that we should not allow a user to use sysfs to override that
> driver knowledge.
> 
In my r8169 use case ASPM works fine on one system, but causes issues
on another one with same NIC chip version. So it may be a wild mix of
NIC hw erratum, BIOS bug, mainboard chipset incompatibilities, etc.
Therefore I disable L1 per default, and allow users to re-enable
L1/L1.1/L1.2 if their system isn't affected.

> Bjorn
> 
> [1] https://lore.kernel.org/r/b8a7f4af2b73f6b506ad8ddee59d747cbf834606.1695025365.git.lukas@xxxxxxxxx
> 
>> Fixes: fb097dcd5a28 ("PCI/ASPM: Disable only ASPM_STATE_L1 when driver disables L1")
>> Signed-off-by: Heiner Kallweit <hkallweit1@xxxxxxxxx>
>> ---
>>  drivers/pci/pcie/aspm.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>> index 3dafba0b5..6d3788257 100644
>> --- a/drivers/pci/pcie/aspm.c
>> +++ b/drivers/pci/pcie/aspm.c
>> @@ -1063,7 +1063,7 @@ static int __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
>>  	if (state & PCIE_LINK_STATE_L0S)
>>  		link->aspm_disable |= ASPM_STATE_L0S;
>>  	if (state & PCIE_LINK_STATE_L1)
>> -		link->aspm_disable |= ASPM_STATE_L1;
>> +		link->aspm_disable |= ASPM_STATE_L1 | ASPM_STATE_L1SS;
>>  	if (state & PCIE_LINK_STATE_L1_1)
>>  		link->aspm_disable |= ASPM_STATE_L1_1;
>>  	if (state & PCIE_LINK_STATE_L1_2)
>> @@ -1251,6 +1251,8 @@ static ssize_t aspm_attr_store_common(struct device *dev,
>>  			link->aspm_disable &= ~ASPM_STATE_L1;
>>  	} else {
>>  		link->aspm_disable |= state;
>> +		if (state & ASPM_STATE_L1)
>> +			link->aspm_disable |= ASPM_STATE_L1SS;
>>  	}
>>  
>>  	pcie_config_aspm_link(link, policy_to_aspm_state(link));
>> -- 
>> 2.42.0
>>




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux