Re: [RFC] USB: Fix persist resume of some SS USB devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 16, 2014 at 01:25:21PM +0800, Pratyush Anand wrote:
> Hi Alan,
> 
> Thanks a lot for your review.
> 
> On Tue, Jul 15, 2014 at 10:46:00PM +0800, Alan Stern wrote:
> > On Tue, 15 Jul 2014, Pratyush Anand wrote:
> >
> > > Problem Summary: Problem has been observed generally with PM states
> > > where VBUS goes off during suspend. There are some SS USB devices which
> > > takes slightly longer time for link training compared to many others.
> > > Such devices fails to reconnect with same old address which was
> > > associated with it before suspend.
> > >
> > > When system  resumes, at some point of time (dpm_run_callback->
> > > usb_dev_resume->usb_resume->usb_resume_both->usb_resume_device->
> > > usb_port_resume) SW reads hub status. If device is present,
> > > then it finishes port resume and re-enumerates device with same
> > > address. If device is not present, then SW thinks that device was
> > > removed during suspend and therefore does logical disconnection
> > > and removes all the resource allocated for this device.
> > >
> > > Now, if I put sufficient delay just before root hub status read in
> > > usb_resume_device then, SW sees always that device is present. In normal
> > > course(without any delay) SW sees that no device is present and then SW
> > > removes all resource associated with the device at this port.  In the
> > > latter case, after sometime, device says that hey I am here, now host
> > > enumerates it, but with new address.
> > >
> > > Problem had been reproduced when I connect verbatim USB3.0 hard disc
> > > with my STiH407 XHCI host running with 3.10 kernel.
> > >
> > > I see that similar problem has been reported here.
> > > https://bugzilla.kernel.org/show_bug.cgi?id=53211
> > > Reading above it seems that bug was not in 3.6.6 and was present in 3.8
> > > and again it was not present for some in 3.12.6, while it was present for
> > > few others. I tested with 3.13-FC19 with i686 desktop, problem was still
> > > there. However, I was failed to reproduce it with 3.16-RC4 running at
> > > same i686 machine. I would say it is just a random observation. Problem for few
> > > devices is always there, as I am unable to find a proper fix for the
> > > issue.
> > >
> > > So, now question is what should be the amount of delay as per USB
> > > specification so that host is always able to recognize suspended device
> > > after resume.
> > >
> > >     -- XHCI specs 4.19.4 says that when Link training is successful,
> > > port sets CSC bit to 1 (root hub controller says that device is
> > > present). So the delay should be the maximum amount of time from the
> > > moment when host enables VBUS to the moment it is able to perform Link
> > > Training.  Unfortunately, I could not find any direct statement in USB
> > > specification for such timeout. Since path from VBUS on to U0 will
> > > follow, like this Rx.Detect.Quite->Rx.Detect.Active->Polling.LFPS->
> > > Polling.ExEQ->Polling.Active->Polling.Configuration->Polling.Idle->U0,
> > > therefore we can set maximum delay as tRxDetectQuietTimeout (12 ms) +
> > > tPollingLFPSTimeout (360 ms) + tPollingActiveTimeout (12 ms) +
> > > tPollingConfigurationTimeout (12 ms) + tPollingIdleTimeout (2 ms)
> > > (398 =~ 400 ms).
> > >
> > > This patch implements above delay, but only for SS device and when
> > > persist is enabled. After every 10 ms SW reads port status and if it
> > > is enabled, SW exits delay loop.
> > >
> > > So, for the good device overhead is just the time to execute
> > > hub_port_status, while for the bad device penalty could be as long as
> > > 400 mS.
> > >
> > > Results:
> > >
> > > Verbatim USB SS hard disk connected with STiH407 USB host running 3.10
> > > Kernel resumes in 461 msecs without this patch, but hard disk is
> > > assigned a new device address. Same system resumes in 790 msecs with
> > > this patch, but with old device address.
> >
> > What happens if you unplug the device while the system is asleep and
> > leave it unplugged during resume?
> 
> Good catch. I need to take care of it.
> 
> >
> >
> > > Signed-off-by: Pratyush Anand <pratyush.anand@xxxxxx>
> > > ---
> > >  drivers/usb/core/hub.c | 37 +++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 37 insertions(+)
> > >
> > > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> > > index 21b99b4..93495b7 100644
> > > --- a/drivers/usb/core/hub.c
> > > +++ b/drivers/usb/core/hub.c
> > > @@ -3245,6 +3245,39 @@ static int finish_port_resume(struct usb_device *udev)
> > >  }
> > >
> > >  /*
> > > + * There are some SS USB devices which takes longer time for link training.
> > > + * XHCI specs 4.19.4 says that when Link training is successful, port
> > > + * sets CSC bit to 1. So if SW reads port status before successful link
> > > + * training, then it will not find device to be present.
> > > + * Successful SS link training follows Rx.Detect.Quite->Rx.Detect.Active->
> >
> > .Quiet?
> 
> Will correct.
> 
> >
> > > + * Polling.LFPS->Polling.ExEQ->Polling.Active->Polling.Configuration->
> > > + * Polling.Idle->U0.
> > > + * Following routine waits for either till port is enable or a timeout
> > > + * of 400 ms whichever is earlier [tRxDetectQuietTimeout (12 ms) +
> > > + * tPollingLFPSTimeout (360 ms) + tPollingActiveTimeout (12 ms) +
> > > + * tPollingConfigurationTimeout (12 ms) + * tPollingIdleTimeout (2 ms)]
> > > + *
> > > + * This routine should only be called when persist is enabled for a SS
> > > + * device.
> > > + */
> > > +static int wait_for_ss_port_enable(struct usb_device *udev,
> > > +                                   struct usb_hub *hub,
> > > +                                   int *port1,
> > > +                                   u16 *portchange,
> > > +                                   u16 *portstatus)
> >
> > Continuation lines should be indented by 2 tab stops, not 5.
> >
> 
> Will correct.
> 
> > > +{
> > > +   int status, wait_count_20ms = 0;
> > > +
> > > +   while (wait_count_20ms++ < 20) {
> > > +           status = hub_port_status(hub, *port1, portstatus, portchange);
> > > +           if (status || *portstatus & USB_PORT_STAT_CONNECTION)
> > > +                   return status;
> > > +           msleep(20);
> > > +   }
> > > +   return hub_port_status(hub, *port1, portstatus, portchange);
> > > +}
> >
> > This might be a little clearer:
> >
> >       int status = 0, delay_ms = 0;
> >
> >       whle (delay_ms < 400) {
> >               if (status || (*portstatus & USB_PORT_STAT_CONNECTION))
> >                       break;
> >               msleep(20);
> >               delay_ms += 20;
> >               status = hub_port_status(hub, *port1, portstatus, portchange);
> >       }
> >       return status;
> >
> > Also, it might help to check the link state in this loop.  If the state
> > indicates that link training is not taking place -- no device is
> > connected -- the loop can end early.
> 
> If device was removed while system asleep, then port will remain in
> rxDetect for ever. Ideally ,port should move to polling in 12 mS (at
> max).
> So, after 20 ms I can check if link status is neither in polling, nor
> in U0 then exit from the loop. May be as follows:
> 
> 
>         int status = 0, delay_ms = 0;
>         u16 link_state;
> 
>         while (delay_ms < 400) {
>                 if (status || *portstatus & USB_PORT_STAT_CONNECTION)
>                         break;
>                 msleep(20);
>                 delay_ms += 20;
>                 status = hub_port_status(hub, *port1, portstatus, portchange);
>                 link_state = *portstatus & USB_PORT_STAT_LINK_STATE;

Hummmm..May be this solution is not perfect. Have been reported with
at least one device which failed with this patch too. A print of
link_state for that device(sleep changed to 2ms) shows that for a long
time link was in rxDetect. May be I will try to see with analyzer what
exactly happens with such devices.

Giving 2nd thought to timeout calculation: Probably we can not
calculate timeout by adding all substates timeout. It would be valid
only when link does not go back in the state machine. For example, it
might happen that host is in polling.active and it did not receive
sufficient number of TS2 in tPollingActiveTimeout and so it moves
backward to polling.LFPS. But next time when state machine advances,
it is able to reach to U0. So the total time may be more than 400 mS
and a successful linkup. Device which failed with this patch worked
with 2000 ms timeout :( Not sure how can we manage timeout.

~Pratyush


>                 if (link_state != USB_SS_PORT_LS_POLLING &&
>                                 link_state != USB_SS_PORT_LS_U0)
>                         break;
>         }
>         return status;
> 
> ~Pratyush
> >
> > Alan Stern
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux