> On a 2 socket, 8 core system, I see anywhere up to nearly 16x better > performance on a stress test. The common cases of call-all, and wait > are improved the least, however I think that if call-single and nowait > are turned into a high performance API, then new usages will pop up > (eg. I started this because I wanted to do "call single, nowait" calls > for migrating block IO completions back to submitting CPU; however I > am also interested in improving the "call all, wait" case for example > to improve vmalloc tlb flushing). TLB flushing at least on x86-64 should be already well optimized on its own. I would be surprised if you could do much better. > As far as I understand, calling a subset of online CPUs that is not all or > one, is used quite infrequently, so this might be OK. With cpusets and isolation etc. it is the normal case. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html