On Wed, Feb 27, 2008 at 02:04:18PM +0100, Andi Kleen wrote: > > > On a 2 socket, 8 core system, I see anywhere up to nearly 16x better > > performance on a stress test. The common cases of call-all, and wait > > are improved the least, however I think that if call-single and nowait > > are turned into a high performance API, then new usages will pop up > > (eg. I started this because I wanted to do "call single, nowait" calls > > for migrating block IO completions back to submitting CPU; however I > > am also interested in improving the "call all, wait" case for example > > to improve vmalloc tlb flushing). > > TLB flushing at least on x86-64 should be already well optimized on its > own. I would be surprised if you could do much better. *vmalloc* TLB flushing. void flush_tlb_all(void) { on_each_cpu(do_flush_tlb_all, NULL, 1, 1); } Of course we could use a new vector for it and speed it up a lot more, but after my vmalloc improvements I think that would be a waste of a vector at this point. > > As far as I understand, calling a subset of online CPUs that is not all or > > one, is used quite infrequently, so this might be OK. > > With cpusets and isolation etc. it is the normal case. Oh really? Coming from what callers? - To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html