On 31.01.2019 23:11, Alan Stern wrote:
On Thu, 31 Jan 2019, Jan-Marek Glogowski wrote:
Perhaps you didn't notice that at the end, hub_activate() calls
kick_hub_wq(). That routine calls
usb_autopm_get_interface_no_resume(), which will prevent the hub trim
suspending until hub_event() calls usb_autopm_put_interface().
Therefore to make things work correctly, hub_activate() has to ensure
that the hub->event_bits and hub->change_bits fields are set correctly,
so that hub_event() will do the right thing when it runs (before the
hub is allowed to go back into suspend).
Therefore if a port is in a funny state, hub_activate() should detect
this and set one of these bit fields appropriately. That's one reason
why it contains all those tests for portstatus and portchange. If a
missing test needs to be added, that's what you should do.
That would be like the v3 of my patch, which started this thread and
detected and handled the warm-reset request of the port in
hub_activate() on resume. Maybe we can set some of these bits
instead, or its already handled by the hub_port_reset() call; have to
check.
Suppose instead of making hub_activate() do the warm reset during
resume, you merely have it set the port's bit in hub->event_bits.
Then hub_event() would see that it needed to call port_event() for that
port. Near the end of port_event() there is code that checks whether a
warm reset is needed; would that existing code be able to solve your
problem?
Sounds good.
So if we can't rely on a hub status urb returning before hub_event(), we
should make sure hub->event_bits are set manually in hub_activate() for all
the same cases as hcd->driver->hub_status_data() in hcd_bus_suspend().
Otherwise we can end up in a hub suspend-resume loop.
I also included a patch in some previous mail after Mathias Nyman
commented about the NULL status_urb, which added
usb_hcd_poll_rh_status directly after the root hubs usb_remote_wakeup
call.
A lot of stuff is deferred via workqueue or polled. As I understand
it - in my case - resume, suspend and status race and miss each other
in some endless loop.
This indicates that something is not using the existing synchronization
mechanisms properly. For example, something sees that a status request
is needed but doesn't store that information in the right way to
prevent a suspend from happening before the status information has been
retrieved.
This is what I believe is going on:
starting point: hub is suspended, xhci gets a port link status change interrupt,
port link is in invalid state, xhci driver resumes roothub and starts roothub polling
loop start:
usb_hcd_poll_rh_status() // roothub polling
see a port needs attention when calling hcd->driver->hub_status_data()
no hcd->status_urb available as hub is suspended, can't giveback urb
hub_activate() // hub resume
manually check port status, nothing found.
usb_submit_urb(hub->urb, GFP_NOIO)
kick_hub_wq(hub);
hub_event() // started by kick_hub_wq() in hub_activate()
hub->event_bits and hub->change_bits are zero as hub status urb didn't return.
put_interface() // as no activity found on hub.
start suspending hub, removes hub->status_urb somewhere in the process
hcd_bus_suspend()
if (hcd->driver->hub_status_data()) // true, races with a root-hub event
hcd_bus_resume()
usb_hcd_poll_rh_status() expires again at this point, goto loop start
To avoid this loop I thought we could force the hub status urb to complete at some point,
but as Alan commented it might be better to make sure the manual port status
checks in hub_activate() covers all the same cases as hcd->driver->hub_status_data().
-Mathias