2.6.25 Changelog :
commit 9af993a92623e022c176459fa6607a564b9a7eaf
Author: Ingo Molnar <mingo@xxxxxxx>
Date: Wed Jan 30 13:34:09 2008 +0100
x86: make ioremap() UC by default
Yes! A mere 120 c_p_a() fixing and rewriting patches later,
we are now confident that we can enable UC by default for
ioremap(), on x86 too.
Every other architectures was doing this already. Doing so
makes Linux more robust against MTRR mixups (which might go
unnoticed if BIOS writers test other OSs only - where PAT
might override bad MTRRs defaults).
Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
On 25 Sep 2009, at 5:23 PM, Jason Nymble wrote:
For posterity, I've finally solved this issue. It ended up having
nothing to do with the interrupts/tasklets themselves. The driver
uses ioremap() to get hold of some reserved memory, and it seems
from about 2.6.25 onwards or so this defaults to ioremap_nocache(),
so our driver was doing memory operations in the tasklet on
uncacheable pages. Calling ioremap_cache() explicitly in the driver
solved the issue (nice when you can fix a perf regression of 50-100x
with a single line fix!). Oprofile was of tremendous help in solving
this issue.
On 09 Sep 2009, at 10:39 AM, Jason Nymble wrote:
Hi,
Background: We use a custom kernel driver module for our PCIe
device which processes bulk data between the host and the card. The
card issues MSI interrupts at up to 20kHz to the host, and the
driver interrupt routine essentially just calls tasklet_schedule()
and returns IRQ_HANDLED, and the work is performed inside the
tasklet routine. This has worked very well for us for the past
several years, with acceptably low overhead on the processor
servicing the interrupts and running the tasklet, using Linux
kernel versions from about 2.6.13 to 2.6.24.
Recent tests on kernels from 2.6.25 to 2.6.30 indicate some serious
regression however. The CPU core servicing the interrupts/tasklets
shows 100% si usage in top for ksoftirqd, and the driver can
consequently only handle a very small fraction of what it was able
to handle using kernel <=2.6.24 (slowdown of around 50-100x)...
Even when we scale back our interrupt rate to 1kHz, we still see
this poor behavior, and from what we can tell the time isn't
actually spent in our tasklet code itself (not 100% sure of this).
The question is, does anybody know of something that has changed in
kernels >= 2.6.25 that might cause this behavior? I've pored over
changelogs and lwn.net articles and lwn.net kernel API change lists
and kernelnewbies kernel change webpage etc., and cannot find
anything which could explain my phenomenon.
Any suggestions for ways to track down where the problem lies? I've
tried running kernels with all the debugging+sanity checks enabled,
and they don't report any badness in the driver. My next step is to
get oprofile going and try to determine exactly where that time is
spent. I would _maybe_ have believed it could perhaps be a Linux
kernel bug (e.g. the softirq that handles tasklets somehow not
ending its loop or something) if it only happened on one kernel
version, but it seems to happen on all kernels from 2.6.25
onwards ...
Thanks in advance
--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ