On Wed, Oct 7, 2009 at 11:57 PM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > This, btw, is exactly the kind of thing we saw with some of the > non-temporal work, when we used nontemporal stores to copy pages on COW > faults, or when doing pre-zeroing of pages. You get rid of some of the > hot-spots in the kernel, and you then replace them with user space taking > the cache misses in random spots instead. The kernel profile looks better, > and system time may go down, but actual performace never went down - you > just moved your cache miss cost from one place to another. A few years ago when K7s were not ancient yet, after hearing argument for and against non-temporal stores, I decided to finally figure it for myself. I tested kernel build workload on two kernels with the only one difference - clear_page with and without non-temporal stores. "Non-temporal stores" kernel was faster, not slower. Just a little bit, but reproducibly. -- vda -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html