Re: interrupt/tasklet issue in custom driver on recent kernels

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



For posterity, I've finally solved this issue. It ended up having nothing to do with the interrupts/tasklets themselves. The driver uses ioremap() to get hold of some reserved memory, and it seems from about 2.6.25 onwards or so this defaults to ioremap_nocache(), so our driver was doing memory operations in the tasklet on uncacheable pages. Calling ioremap_cache() explicitly in the driver solved the issue (nice when you can fix a perf regression of 50-100x with a single line fix!). Oprofile was of tremendous help in solving this issue.

On 09 Sep 2009, at 10:39 AM, Jason Nymble wrote:

Hi,

Background: We use a custom kernel driver module for our PCIe device which processes bulk data between the host and the card. The card issues MSI interrupts at up to 20kHz to the host, and the driver interrupt routine essentially just calls tasklet_schedule() and returns IRQ_HANDLED, and the work is performed inside the tasklet routine. This has worked very well for us for the past several years, with acceptably low overhead on the processor servicing the interrupts and running the tasklet, using Linux kernel versions from about 2.6.13 to 2.6.24.

Recent tests on kernels from 2.6.25 to 2.6.30 indicate some serious regression however. The CPU core servicing the interrupts/tasklets shows 100% si usage in top for ksoftirqd, and the driver can consequently only handle a very small fraction of what it was able to handle using kernel <=2.6.24 (slowdown of around 50-100x)... Even when we scale back our interrupt rate to 1kHz, we still see this poor behavior, and from what we can tell the time isn't actually spent in our tasklet code itself (not 100% sure of this).

The question is, does anybody know of something that has changed in kernels >= 2.6.25 that might cause this behavior? I've pored over changelogs and lwn.net articles and lwn.net kernel API change lists and kernelnewbies kernel change webpage etc., and cannot find anything which could explain my phenomenon.

Any suggestions for ways to track down where the problem lies? I've tried running kernels with all the debugging+sanity checks enabled, and they don't report any badness in the driver. My next step is to get oprofile going and try to determine exactly where that time is spent. I would _maybe_ have believed it could perhaps be a Linux kernel bug (e.g. the softirq that handles tasklets somehow not ending its loop or something) if it only happened on one kernel version, but it seems to happen on all kernels from 2.6.25 onwards ...

Thanks in advance


--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ


[Index of Archives]     [Newbies FAQ]     [Linux Kernel Mentors]     [Linux Kernel Development]     [IETF Annouce]     [Git]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux SCSI]     [Linux ACPI]
  Powered by Linux