Re: Kernel ISR Latency

Brian Wrenn <dcbrianw@xxxxxxxxx> · Mon, 10 Apr 2017 23:36:26 -0400

Update: I tried running the scheme described earlier in this thread
with one modification: no CRC, just sequence checking.  (Each SPI
transmission increments a count held in the first bytes.)  In this
case, no ISR had latency long enough to set the error GPIO (output
from the SOM's perspective) high.  The highest priority ISR, after
those with priority 99, is the SPI read ISR at priority 80.  I set the
two RT threads in user space to priority 10.  So at this point, I'm
trying to determine why the SPI read ISR doesn't preempt those user
space threads in less than 6 milliseconds.  This failure to preempt
happens for between 0.2%-0.01% of the micro-controller issued
interrupts.  The error percentages vary some per run.  FWIW, any
non-RT load I place on the system (generic network and processor
busywork) don't seem to have much impact.

Also, in answering how I'm measuring latency, I reviewed what I wrote
and I think I didn't explain something clearly.  A bad CRC or bad
sequence number makes the the error indicator GPIO pin go high (a GPIO
output from the SOM's perspective/input from the logic analyzer's
perspective).  A high latency (~6 milliseconds after the
micro-controller issued interrupt) invocation of the kernel module ISR
that performs the two SPI transfers results in the subsequent
micro-controller's interrupt occurring during the SPI transfers, which
causes the following SPI transfer to have bad data and hence a bad
CRC.  The .tiff I attached earlier illustrates this.  If you look at
the sad face SPI transfer (it only shows one of the SPIs, not both),
you can see the ISR for the next transfer overlap with the very end of
the sad face transfer.

Thanks,
Brian

On Mon, Apr 10, 2017 at 5:31 PM, Brian Wrenn <dcbrianw@xxxxxxxxx> wrote:
> I imagine that it would be a bad idea to prioritize my ISR ahead of
> any one of these?
>
>    12  FF     99 139   - [posixcputmr/0]
>    14  FF     99 139   - [migration/0]
>    15  FF     99 139   - [watchdog/0]
>    16  FF     99 139   - [watchdog/1]
>    17  FF     99 139   - [migration/1]
>    21  FF     99 139   - [posixcputmr/1]
>    24  FF     99 139   - [watchdog/2]
>    25  FF     99 139   - [migration/2]
>    29  FF     99 139   - [posixcputmr/2]
>    32  FF     99 139   - [watchdog/3]
>    33  FF     99 139   - [migration/3]
>    37  FF     99 139   - [posixcputmr/3]
>
> At this stage, that's all that prioritized higher.
>
> On Mon, Apr 10, 2017 at 4:48 PM, Brian Wrenn <dcbrianw@xxxxxxxxx> wrote:
>> On Mon, Apr 10, 2017 at 3:55 PM, Julia Cartwright <julia@xxxxxx> wrote:
>>> On Mon, Apr 10, 2017 at 03:12:28PM -0400, Brian Wrenn wrote:
>>>> Hi, I'm trying to track down the source of some ISR latency that I'm
>>>> observing on a logic analyzer.
>>>>
>>>> Just for some background, my setup is the following.  I'm running
>>>> Linaro 4.4.9 with PREEMPT RT Patch 17 on a Variscite SD410 SOM devkit.
>>>> My kernel has the following PREEMPT RT options.
>>> [..]
>>>>
>>>> I have a micro-controller attached to a GPIO pin and two separate SPI
>>>> ports.  Every 10 milliseconds the micro-controller writes 18 Kbytes of
>>>> data to each of 2 SPI ports (total of 36 Kbytes between the two) and
>>>> issues an interrupt on the GPIO.  A kernel module has an ISR
>>>> registered to the GPIO pin.  When the ISR executes, it copies 18
>>>> Kbytes of data from each SPI.  There are two RT threads executing in
>>>> user space as SCHED_RR with identical priorities less than (i.e. lower
>>>> than) 50, where as I understand the kernel runs at priority 50.
>>>
>>> It's a bit more complicated than "the kernel runs at priority 50".
>>
>> Yes, better stated, my kernel space IRQ gets a priority of 50.  (Most
>> of the others seem to have 50 as well.)
>>
>>>
>>> Perhaps what you're getting at is that by default, with irq forced
>>> threading enabled (which is the default with -rt build kernels) the
>>> irqthreads have the default scheduling policy FIFO with priority 50.
>>> It's up to the administrator/system integrator to appropriately
>>> prioritize these irqthreads to suit the application.
>>>
>>
>> Pursuant to that, I'm reviewing this to gain a better understanding of
>> such things.  Please forgive me; I hadn't found this link prior to
>> messaging the listserve.
>>
>> http://subversion.ffado.org/wiki/IrqPriorities
>>
>>> For your usecase, this means that likely _all_ of the peripheral
>>> interrupts, including the low priority stuff you might not care about
>>> for your application (like networking or uart or whatever) are being
>>> prioritized right alongside your latency sensitive gpio handler.
>>>
>>>> Each of those user space threads copies data from some shared memory
>>>> within the kernel module and performs some basic data integrity
>>>> checks, i.e.  a SHA1.
>>>
>>> Just to better understand: how are your RT threads signalled by your
>>> gpio handler?
>>
>> The user space RT threads have mmap'ed memory shared with the kernel
>> module.  The kernel module writes to a designated memory location the
>> current read offset so that the user space RT threads know where to
>> read from iteration to iteration.
>>
>>>
>>>> The vast majority of interrupts incur a delay in range of tens of
>>>> MICROseconds, meaning the ISR executes within such amount of time of
>>>> the micro-controller raising the GPIO pin of the ISR.  However,
>>>> occasionally, more than six MILLIseconds passes between the time of
>>>> the micro-controller issuing the GPIO interrupt and the ISR beginning
>>>> to execute.
>>>
>>> Also: how are you measuring this latency?  In particular, where is your
>>> test point for the "ISR beginning"?
>>>>
>>
>> Upon detection of a bad CRC or bad sequence number, the user space
>> threads raise another GPIO, not the GPIO of the ISR, another one that
>> triggers a logic analyzer to dump a forward captured buffer.  Then, I
>> got back through the capture and identify the high latency ISR
>> execution.  (See attached image.)
>>
>>>> I'm trying to track down the cause of the 6+ MILLIsecond latency,
>>>> which I would not expect to happen.  Are there any well practiced
>>>> techniques for determining what factor(s) may be introducing this
>>>> latency?  I looked at the description for RT Watchdog, but I don't
>>>> think it would help me much because the only two RT threads other than
>>>> the kernel have a lower priority.
>>>
>>> I don't think the RT watchdog will help you here, however, I would
>>> recommend that you get acquainted with using ftrace/trace-cmd and the
>>> irq and sched traceevents.
>>>
>>>    Julia
>>
>> It sounds like to me that maybe you've confirmed that I'm looking in
>> the right direction w.r.t. modifying the kernel ISRs with something
>> like 'chrt' and 'rtiq'.  Yes?  But I also will look into
>> ftrace/trace-cmd and irq and sched trace events.
>>
>> Thanks,
>> Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html