On 07/11/2012 02:42 PM, Konrad Rzeszutek Wilk wrote: >>>> Which architecture was this under? It sounds x86-ish? Is this on >>>> Westmere and more modern machines? What about Core2 architecture? >>>> >>>> Oh how did it work on AMD Phenom boxes? >>> >>> I don't have a Phenom box but I have an Athlon X2 I can try out. >>> I'll get this information next Monday. >> >> Actually, I'm running some production stuff on that box, so >> I rather not put testing stuff on it. Is there any >> particular reason that you wanted this information? Do you >> have a reason to believe that mapping will be faster than >> copy for AMD procs? > > Sorry for the late response. Working on some ugly bug that is taking > more time than anticipated. > My thoughts were that these findings are based on the hardware memory > prefetcher. The Intel > machines - especially starting with Nehelem have some pretty > impressive prefetcher where > even doing in a linked list 'prefetch' on the next node is not beneficial. > > Perhaps the way to leverage this is to use different modes depending > on the bulk of data? > When there is a huge amount use the old method, but for small use copy > (as it would > in theory stay in the cache longer). Not sure what you mean by "bulk" or "huge amount" but the maximum size of mapped object is PAGE_SIZE and the typical size more around PAGE_SIZE/2. So that is what I'm considering. Do you think it makes a difference with copies that small? Thanks, Seth _______________________________________________ devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxx http://driverdev.linuxdriverproject.org/mailman/listinfo/devel