Re: interrupt/tasklet issue in custom driver on recent kernels

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 09 Sep 2009, at 3:47 PM, Mulyadi Santosa wrote:

Hi Jason...

On Wed, Sep 9, 2009 at 3:39 PM, Jason Nymble<jason.nymble@xxxxxxxxx> wrote:
Hi,

Background: We use a custom kernel driver module for our PCIe device which processes bulk data between the host and the card. The card issues MSI interrupts at up to 20kHz to the host, and the driver interrupt routine essentially just calls tasklet_schedule() and returns IRQ_HANDLED, and the work is performed inside the tasklet routine. This has worked very well for us for the past several years, with acceptably low overhead on the processor
servicing the interrupts and running the tasklet, using Linux kernel
versions from about 2.6.13 to 2.6.24.

Recent tests on kernels from 2.6.25 to 2.6.30 indicate some serious
regression however. The CPU core servicing the interrupts/tasklets shows 100% si usage in top for ksoftirqd, and the driver can consequently only handle a very small fraction of what it was able to handle using kernel
<=2.6.24 (slowdown of around 50-100x)... Even when we scale back our
interrupt rate to 1kHz, we still see this poor behavior, and from what we can tell the time isn't actually spent in our tasklet code itself (not 100%
sure of this).

I guess it has something to do with CFS (Complete Fair Scheduler).
Your tasklet's time slice is "punished" so it can not dominate the
entire CPU time by itself, hence the name "fair scheduler".

I agree that profiling using OProfile might reveal the time occupation
percentage on several kernel functions (including your driver's
codes).

As the workaround, I guess it can be done by creating workqueue
threads as many as your CPU cores. Therefore, time slicing will be
splitted among these threads.




Interesting, I also considered at one point whether it has something to do with CFS, and even contemplated maybe using the BFS patches. What bothers me though, why in kernels <= 2.6.24 did the thing use nowhere near 100% CPU, and now in >=2.6.25 it uses 100% CPU and gets MUCH less work done (50-100x less)... That doesn't sound to me to be a symptom of just being scheduled more fairly, unless there is a bug. (There is nothing else using up the CPU in this particular test scenario)

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ


[Index of Archives]     [Newbies FAQ]     [Linux Kernel Mentors]     [Linux Kernel Development]     [IETF Annouce]     [Git]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux SCSI]     [Linux ACPI]
  Powered by Linux