Re: [PATCH] USB: hcd-pci: Fully suspend across freeze/thaw cycle

Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> · Thu, 14 Apr 2022 20:06:32 +0300

On 14.4.2022 19.30, Evan Green wrote:
> Hi Alan and Mathias,
> 
> On Thu, Apr 14, 2022 at 7:21 AM Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
>>
>> On Thu, Apr 14, 2022 at 05:00:12PM +0300, Mathias Nyman wrote:
>>> On 12.4.2022 18.40, Alan Stern wrote:
>>>> On Tue, Apr 12, 2022 at 05:56:42PM +0300, Mathias Nyman wrote:
>>>>> On 11.4.2022 17.50, Alan Stern wrote:
>>>>>> For example, what would happen if the user unplugs a device right in the
>>>>>> middle of the freeze transition, after the root hub has been frozen but
>>>>>> before the controller is frozen?  We don't want such an unplug event to
>>>>>> prevent the system from going into hibernation -- especially if the root
>>>>>> hub was not enabled for wakeup.
>>>>>
>>>>> We should be able to let system go to hibernate even if we get a disconnect
>>>>> interrupt between roothub and host controller freeze.
>>>>> Host is not yet suspended so no PME# wake is generated, only an interrupt.
>>>>>
>>>>> From Linux PM point of view it should be ok as well as the actual xhci
>>>>> device that is generating the interrupt is hasnt completer freeze()
>>>>>
>>>>> The xhci interrupt handler just needs to make sure that the disconnect
>>>>> isn't propagated if roothub is suspended and wake on disconnect
>>>>> is not set. And definitely make sure xhci doesn't start roothub polling.
>>>>>
>>>>> When freeze() is called for the host we should prevent the host from
>>>>> generating interrupts.
>>>>
>>>> I guess that means adding a new callback.  Or we could just suspend the
>>>> controller, like Evan proposed originally
>>>
>>> Suspending the host in freeze should work.
>>> It will do an extra xhci controller state save stage, but that should be harmless.
>>>
>>> But is there really a need for the suggested noirq part?
>>>
>>> +     .freeze_noirq   = hcd_pci_suspend_noirq,
>>>
>>> That will try to set the host to PCI D3 state.
>>> It seems a bit unnecessary for freeze.
>>
>> Agreed.
>>
>>>>>> (If the root hub _is_ enabled for wakeup then it's questionable.
>>>>>> Unplugging a device would be a wakeup event, so you could easily argue
>>>>>> that it _should_ prevent the system from going into hibernation.  After
>>>>>> all, if the unplug happened a few milliseconds later, after the system
>>>>>> had fully gone into hibernation, then it would cause the system to wake
>>>>>> up.)
>>>>>>
>>>>>>> Would it make sense prevent xHCI interrupt generation in the host
>>>>>>> freeze() stage, clearing the xHCI EINT bit in addition to calling
>>>>>>> check_roothub_suspend()?
>>>>>>> Then enable it back in thaw()
>>>>>>
>>>>>> That won't fully eliminate the problem mentioned in the preceding
>>>>>> paragraphs, although I guess it would help somewhat.
>>>>>
>>>>> Would the following steps solve this?
>>>>>
>>>>> 1. Disable device initiated resume for connected usb devices in freeze()
>>>>>
>>>>> 2. Don't propagate connect or OC changes if roothub is suspended and port wake
>>>>>    flags are disabled. I.E don't kick roothub polling in xhci interrupt
>>>>>    handler here.
>>>>
>>>> I guess you can't just halt the entire host controller when only one of
>>>> the root hubs is suspended with wakeup disabled.  That does complicate
>>>> things.  But you could halt it as soon as both of the root hubs are
>>>> frozen.  Wouldn't that prevent interrupt generation?
>>>
>>> True, but probably easier to just suspend host in freeze() as you stated above.
>>
>> Okay.
>>
>> Evan, this discussion suggests that you rewrite your patch as a series
>> of three:
>>
>>      1. Change choose_wakeup() so that for PM_EVENT_FREEZE, wakeup is
>>         always disabled.
> 
> If I understand this correctly, this means potentially runtime
> resuming the device so its wakeup setting can be consistently set to
> wakeups disabled across a freeze transition. Got it I think in terms
> of the "how".
> 
>>
>>      2. Change the xhci-hcd interrupt handler so that port-status
>>         changes are ignored if the port's root hub is suspended with
>>         wakeup disabled.
> 
> This part confuses me. This would be way deep under
> xhci_handle_event(), probably in handle_port_status(), just throwing
> away certain events that come in the ring. How would we know to go
> back and process those events later? I think we don't need to do this
> if we suspend the controller as in #3 below. The suspended (halted)
> controller wouldn't generate event interrupts (since the spec mentions
> port status change generation is gated on HCHalted). So we're already
> covered against receiving interrupts in this zone by halting the
> controller, and the events stay nicely pending for when we restart it
> in thaw.

Was thinking the same here. It would be nice to have this to comply with
usb spec, keeping roothub from propagating connect/disconnect events
immediately after suspending it with wake flags cleared.

But it's a lot of work to implement this, and for this issue, and linux 
hibernate point of view I don't think it has any real benefit.
The actual device generating the interrupt is the host (parent of roothub),
and that will stop once freeze() is called for it in #3 

> 
> Is the goal of #1 purely a setup change for #2, or does it stand on
> its own even if we nixed #2? Said differently, is #1 trying to ensure
> that wake signaling doesn't occur at all between freeze and thaw, even
> when the controller is suspended and guaranteed not to generate
> interrupts via its "normal" mechanism? I don't have a crisp mental
> picture of how the wake signaling works, but if the controller wake
> mechanism sidesteps the original problem of sending an MSI to a dead
> CPU (as in, it does not use MSIs), then it might be ok as-is.

#1 is needed because xHCI can generate wake events even when halted if
device initiated resume signaling is detected on a roothub port.
Just like it can generate wake events on connect/disconnect if wake flags
are set. (xhci spec figure 4-34, see PLS=Resume)
We want to avoid those wakeups between freeze-thaw

So just #1 and #3 should probably solve this, and be an easier change. 

-Mathias

> 
>>
>>      3. As in the original patch, make the .freeze and .thaw callbacks
>>         in hcd-pci.c call the appropriate suspend and resume routines,
>>         but don't do anything for .freeze_noirq and .thaw_noirq.
> 
> Sure. I had made the _noirq paths match suspend for consistency, I
> wasn't sure if those could mix n match without issues. I'll try it out
> leaving the _noirq callbacks alone.
> -Evan
> 
>>
>> How does that sound?
>>
>> Alan Stern