RT-ers, Lately we've been struggling with some performance issues on high-core count (>16 cores) NUMA machines with the RT kernel. During the course of troubleshooting this issue, we tried using the 'numactl' program to constrain our measurement testing tool (rteval) to a particular memory node, rather than letting everything float. Doing so showed marked improvement in both max latency and jitter. While this doesn't solve our performance problems I thought it might make sense to have a --numa mode for cylictest that compliments the --smp mode just added. The big difference here is that when using --numa, each measurement thread (one per cpu) has it's stack allocated from the memory node associated with it's cpu. Also, the major data structures for each thread (parameter block, statistics block and histogram) are allocated from the appropriate node. This is done with calls into libnuma, which means this will add a dependency on libnuma. The intent is to measure latency on a numa system in the same way a well-written RT application would run on a NUMA machine, that is minimizing the off-node memory references. If you're interested in looking at this, please pull the numa branch from my git repo at: git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rt-tests.git and let me know if you find bugs or disagree with the approach. Thanks, Clark
Attachment:
signature.asc
Description: PGP signature