Re: [PATCH 4/5] PCI: wait device ready after pci_pm_reset()

Sinan Kaya <okaya@xxxxxxxxxxxxxx> · Mon, 16 Oct 2017 08:51:49 -0400

On 10/12/2017 12:48 PM, Sinan Kaya wrote:
> On 10/11/2017 6:06 PM, Bjorn Helgaas wrote:
>>> static int pci_pm_reset(struct pci_dev *dev, int probe)
>>>  {
>>> +	unsigned int delay = dev->d3_delay;
>>>  	u16 csr;
>>>  
>>>  	if (!dev->pm_cap || dev->dev_flags & PCI_DEV_FLAGS_NO_PM_RESET)
>>> @@ -3988,7 +3989,10 @@ static int pci_pm_reset(struct pci_dev *dev, int probe)
>>>  	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, csr);
>>>  	pci_dev_d3_sleep(dev);
>>>  
>>> -	return 0;
>>> +	if (delay < pci_pm_d3_delay)
>>> +		delay = pci_pm_d3_delay;
>>> +
>>> +	return pci_dev_wait(dev, "PM D3->D0", delay, 1000);
>> 1) Why do we wait up to 1 second here, when we wait up to 60 seconds
>> for the other methods?  Can they all be the same?  Maybe a #define for
>> it?
> 
> I know you want to have similar behavior for systems that do and do not support
> CRS. That was the reason why I converted flr wait function to into dev_wait function.
> 
> However, here is the problem:
> 
> For systems that do not support CRS, there is no way of knowing whether we
> are reading 0xFFFFFFFF because the endpoint is not reachable due to an error
> like "it doesn't support this reset type" or if it is actually emitting a CRS.
> 
> If one system has a problem with pm_reset, this code would add an unnecessary
> 1 second delay into the reset path. If I make it 60 it would be something like:
> 
> 1. try reset method A
> 2. wait 60 seconds
> 3. try reset method B
> 4. wait 60 seconds. 
> 5. try reset method C
> 6. wait 60 seconds
> 
> This might end up being a regression on some system. 
> 
> I'm still leaning towards a wait only if we are observing a CRS. What's your
> thought on this?
> 
> then the sequence would be.
> 
> 1. try reset method A
> 2. if CRS pending, wait 60 seconds
> 3. try reset method B
> 4. if CRS pending, wait 60 seconds. 
> 5. try reset method C
> 6. if CRS pending, wait 60 seconds
> 

Thinking more about this. Another possibility is to have an adjustable sleep time.
Start with 60 seconds for all reset types. If somebody doesn't like it,
have a kernel command line override.

>>
>> 2) I don't really like the fact that we do the initial sleep one place
>> and then pass the length of that sleep here.  It's hard to verify
>> they're the same and keep them in sync.  I think the only thing you
>> use initial_wait for is to include that time in the dmesg messages.
>> Maybe we should just omit that time from the message and drop the
>> parameter?
>>
> 
> This was for printing reasons like you spotted, I can certainly get rid of
> the initial_wait.
> 

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.