> On Aug 26, 2022, at 5:11 PM, Zi Yan <ziy@xxxxxxxxxx> wrote: > > On 25 Aug 2022, at 17:30, alexlzhu@xxxxxx wrote: > > How large is the memory? Just wonder the scanning speed. > Also, it might be better to explicitly add the time unit, second, > in the output. The size of memory was 65GB on the test machine I obtained these numbers on. I’ll take note of adding the time unit. Thanks! > Is it possible to use cache-bypassing read to avoid cache > pollution? You are scanning for 256*2M at a time. Wouldn’t that > wipe out all the useful data in the cache? I have only found non-temporal writes in arch/x86/, not non-temporal reads (with MOVNTDQA). I suppose we should figure out why nobody ever bothered using non-temporal reads on x86 before trying to make this code use them. A quick search of the internet suggests they non-temporal reads are not being used on x86 because people could not show a performance improvement by using them, but maybe somebody here has more insight?