The 04/22/2020 12:19, Will Deacon wrote: > > I wrote the silly harness below for the snippets given in [1] but I can't > see any difference between the forwards and backwards versions on any arm64 > systems I have access to. > > Will > > --->8 > > #include <errno.h> > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > #include <time.h> > #include <unistd.h> > > [...] > int main(void) > { > void *buf; > unsigned long long delta; > struct timespec ts_start, ts_end; > > if (posix_memalign(&buf, PAGE_SIZE, BUF_SZ)) { > perror("posix_memalign()"); > return -1; > } > > memset(buf, 0xd, BUF_SZ); > > [...] With this exact test code I also didn't observe any significant difference between forward and backwards versions on SM8150. ------------------------------------------------------------------------------ Output on 8150 under controlled conditions(CPU0 & 6 turned on, CPUs set to max frequency and DDR set to performance governor): ------------------------------------------------------------------------------ Forwards: took 0.319658 seconds Backwards: took 0.320983 seconds ------------------------------------------------------------------------------ But when I used malloc instead of posix_memalign because that was the big difference between this and our test code, I observed significant difference between forward and backwards version on SM8150. ------------------------------------------------------------------------------ Output on 8150 under controlled conditions(CPU0 & 6 turned on, CPUs set to max frequency and DDR set to performance governor): ------------------------------------------------------------------------------ Forwards: took 0.323157 seconds Backwards: took 0.581638 seconds ------------------------------------------------------------------------------ I don't know the implementation differences between posix_memalign and malloc which might lead to these results. -- Prathu Baronia OnePlus RnD