On Thu, Oct 08, 2009 at 03:12:08PM +0200, Denys Vlasenko wrote: > On Wed, Oct 7, 2009 at 11:57 PM, Linus Torvalds > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > This, btw, is exactly the kind of thing we saw with some of the > > non-temporal work, when we used nontemporal stores to copy pages on COW > > faults, or when doing pre-zeroing of pages. You get rid of some of the > > hot-spots in the kernel, and you then replace them with user space taking > > the cache misses in random spots instead. The kernel profile looks better, > > and system time may go down, but actual performace never went down - you > > just moved your cache miss cost from one place to another. > > A few years ago when K7s were not ancient yet, after hearing > argument for and against non-temporal stores, > I decided to finally figure it for myself. > > I tested kernel build workload on two kernels with the only > one difference - clear_page with and without non-temporal stores. > > "Non-temporal stores" kernel was faster, not slower. Just a little bit, > but reproducibly. It is going to be highly dependent on architecture and workload and exactly where you use the nontemporal stores of course. I would say with non-temporal stores in clear_page (a case where we can often expect the memory to be used again quickly because it is anonymous process memory), then we are quite likely to cause _more_ activity on the memory controller and dimms which cost far more power than cache access. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html