Re: [RFC][PATCH] ehci: make ISO schedule immune to long irq delay

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> · Thu, 21 Jul 2011 11:05:16 -0400 (EDT)

On Thu, 21 Jul 2011, Ming Lei wrote:

> Hi,
> 
> On Wed, Jul 20, 2011 at 10:48 PM, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
> > On Wed, 20 Jul 2011 tom.leiming@xxxxxxxxx wrote:
> >
> >> From: Ming Lei <tom.leiming@xxxxxxxxx>
> >>
> >> The current ehci ISO schedule only supports that the max irq delay
> >> for ehci is 20ms(see 'if (excess >= mod - 2 * SCHEDULE_SLOP)' in
> >> iso_stream_schedule). And if ehci irq is disabled for longer time
> >> than 20ms, ehci ISO schedule will behave badly and cause
> >> failure of "request %p would overflow".
> >
> > How often does this happen? �If it does happen then something is badly
> > wrong already. �In such cases the failure ought to be easily visible --
> > we don't want to hide it and pretend that everything is all right.
> >
> > For example, there was a case in the mailing list not long ago where
> > the IRQ delay was as long as 40 ms. �The scheduling failure caused
> > people to investigate, and they found that their wifi driver was
> > leaving interrupts disabled much longer than it should have. �Without
> > this obvious failure, they might never have known.
> >
> > In short, it's not obvious that this problem needs to be fixed.
> 
> Yes, generally speaking, it always happens when something is wrong
> in system, but maybe in some embedded system, there may be a
> little long irq delay.

Do you have an actual use case?  Or is this all just theoretical?

> But maybe we can improve ehci-hcd so that ehci can continue to work well
> even if the bad irq delay happens, but just prints some warning.

Adding a big fat warning to the patch might be good.

> >> � � � � � � � �* We decide based on the time of the last currently-scheduled
> >> � � � � � � � �* slot, not the time of the next available slot.
> >> � � � � � � � �*/
> >> � � � � � � � excess = (stream->next_uframe - period - next) & (mod - 1);
> >> - � � � � � � if (excess >= mod - 2 * SCHEDULE_SLOP)
> >> + � � � � � � if (ktime_us_delta(stream->next_ktime, ktime_get()) < 0)
> >
> > This computation can be wrong, because ktime values are not
> > synchronized with the EHCI controller's frame counter. �Also, it
> 
> From ktime_get(), seems it may always return the latest time, so
> I am not sure why you say it is not synchronized.

ktime values are based on a different clock from the frame counter.  
These clocks do not run at the same rate.  For example, it's possible
that when the frame counter has advanced by 128 frames, the ktime value
may have advanced by only 127985 us.  If that happens, your check would
indicate the URB was submitted in time even though it really wasn't.

> >> � � � � � � � � � � � start = next + excess - mod + period *
> >> � � � � � � � � � � � � � � � � � � � DIV_ROUND_UP(mod - excess, period);
> >
> > You need to change this computation too, because it's no longer
> > necessarily true that next + excess lies between mod and 2*mod. �Get
> > rid of the "- mod" and instead compute "& (mod - 1)".
> 
> If the computation is changed, the later check should return failed, so
> I just keep it to make the check happy.

I don't understand.  You always need to compute "start" correctly,
whether the check failed or not.  With your new check added, the 
existing computation will not always be correct.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html