Re: [RFC][PATCH] ehci: make ISO schedule immune to long irq delay

Ming Lei <tom.leiming@xxxxxxxxx> · Thu, 21 Jul 2011 14:39:37 +0800

Hi,

On Wed, Jul 20, 2011 at 10:48 PM, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
> On Wed, 20 Jul 2011 tom.leiming@xxxxxxxxx wrote:
>
>> From: Ming Lei <tom.leiming@xxxxxxxxx>
>>
>> The current ehci ISO schedule only supports that the max irq delay
>> for ehci is 20ms(see 'if (excess >= mod - 2 * SCHEDULE_SLOP)' in
>> iso_stream_schedule). And if ehci irq is disabled for longer time
>> than 20ms, ehci ISO schedule will behave badly and cause
>> failure of "request %p would overflow".
>
> How often does this happen?  If it does happen then something is badly
> wrong already.  In such cases the failure ought to be easily visible --
> we don't want to hide it and pretend that everything is all right.
>
> For example, there was a case in the mailing list not long ago where
> the IRQ delay was as long as 40 ms.  The scheduling failure caused
> people to investigate, and they found that their wifi driver was
> leaving interrupts disabled much longer than it should have.  Without
> this obvious failure, they might never have known.
>
> In short, it's not obvious that this problem needs to be fixed.

Yes, generally speaking, it always happens when something is wrong
in system, but maybe in some embedded system, there may be a
little long irq delay.

But maybe we can improve ehci-hcd so that ehci can continue to work well
even if the bad irq delay happens, but just prints some warning.

>
>> This patch uses kernel time to check if iso schedule was handled
>> late, and fixes the problem.
>>
>> Certainly, the problem can be fixed by increasing the allowable
>> delay, such as:
>>
>>       if (excess >= mod - N * SCHEDULE_SLOP)  /*iso_stream_schedule*/
>>
>> and take a larger value of N than 2, but the fix is not feasible
>> enough, also may decrease the allowable ISO schedule window, especially
>> for small periodic size case(such as, 256 and 512).
>>
>> The method in this patch can fix all the above shortcomings, and the
>> only drawback is that it may introduce extra computation on ktime.
>
> The idea is good but the implementation could be improved.
>
>> Signed-off-by: Ming Lei <tom.leiming@xxxxxxxxx>
>> ---
>>  drivers/usb/host/ehci-sched.c |   17 +++++++++++++++--
>>  drivers/usb/host/ehci.h       |    1 +
>>  2 files changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/usb/host/ehci-sched.c b/drivers/usb/host/ehci-sched.c
>> index 6c9fbe3..1846cbe 100644
>> --- a/drivers/usb/host/ehci-sched.c
>> +++ b/drivers/usb/host/ehci-sched.c
>> @@ -1432,16 +1432,17 @@ iso_stream_schedule (
>>               else
>>                       next = now;
>>
>> -             /* Fell behind (by up to twice the slop amount)?
>> +             /* Fell behind (by up to any time)
>
> If you're going to erase the main point of the parenthetical remark,
> you might as well get rid of the entire remark.
>
>>                * We decide based on the time of the last currently-scheduled
>>                * slot, not the time of the next available slot.
>>                */
>>               excess = (stream->next_uframe - period - next) & (mod - 1);
>> -             if (excess >= mod - 2 * SCHEDULE_SLOP)
>> +             if (ktime_us_delta(stream->next_ktime, ktime_get()) < 0)
>
> This computation can be wrong, because ktime values are not
> synchronized with the EHCI controller's frame counter.  Also, it

>From ktime_get(), seems it may always return the latest time, so
I am not sure why you say it is not synchronized.

> ignores the value of "next".  Thus you really can't remove the
> existing test.
>
> However, it is possible to add an additional test.  If the
> ktime_us_delta value is < 10000 (i.e., 10 ms) then the submission is
> definitely too late.

Yes, it is good.

>
>>                       start = next + excess - mod + period *
>>                                       DIV_ROUND_UP(mod - excess, period);
>
> You need to change this computation too, because it's no longer
> necessarily true that next + excess lies between mod and 2*mod.  Get
> rid of the "- mod" and instead compute "& (mod - 1)".

If the computation is changed, the later check should return failed, so
I just keep it to make the check happy.

>
>>               else
>>                       start = next + excess + period;
>> +
>>               if (start - now >= mod) {
>>                       ehci_dbg(ehci, "request %p would overflow (%d+%d >= %d)\n",
>>                                       urb, start - now - period, period,
>> @@ -1591,6 +1592,17 @@ itd_link (struct ehci_hcd *ehci, unsigned frame, struct ehci_itd *itd)
>>       *hw_p = cpu_to_hc32(ehci, itd->itd_dma | Q_TYPE_ITD);
>>  }
>>
>> +static void get_next_uframe_ktime(struct ehci_hcd *ehci,
>> +     struct ehci_iso_stream  *stream)
>> +{
>> +     unsigned        mod = ehci->periodic_size << 3;
>> +     u32             now =
>> +             ehci_readl(ehci, &ehci->regs->frame_index) & (mod - 1);
>> +
>> +     stream->next_ktime = ktime_add_us(ktime_get(),
>> +                     ((stream->next_uframe - now) & (mod - 1)) * 125);
>> +}
>
> You don't need two separate "& (mod - 1)" operations; only the second
> one is necessary.

Yes, you are right.

>
>> +
>>  /* fit urb's itds into the selected schedule slot; activate as needed */
>>  static int
>>  itd_link_urb (
>> @@ -1658,6 +1670,7 @@ itd_link_urb (
>>               }
>>       }
>>       stream->next_uframe = next_uframe;
>> +     get_next_uframe_ktime(ehci, stream);
>
> Something like this is also needed in sitd_link_urb().

Yes, it is needed.

Thanks for your comment.

thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html