On Thu, 21 Jun 2018, Ingo Molnar wrote: > > * Mike Snitzer <snitzer@xxxxxxxxxx> wrote: > > > From: Mikulas Patocka <mpatocka@xxxxxxxxxx> > > Subject: [PATCH v2] x86: optimize memcpy_flushcache > > > > In the context of constant short length stores to persistent memory, > > memcpy_flushcache suffers from a 2% performance degradation compared to > > explicitly using the "movnti" instruction. > > > > Optimize 4, 8, and 16 byte memcpy_flushcache calls to explicitly use the > > movnti instruction with inline assembler. > > Linus requested asm optimizations to include actual benchmarks, so it would be > nice to describe how this was tested, on what hardware, and what the before/after > numbers are. > > Thanks, > > Ingo It was tested on 4-core skylake machine with persistent memory being emulated using the memmap kernel option. The dm-writecache target used the emulated persistent memory as a cache and sata SSD as a backing device. The patch results in 2% improved throughput when writing data using dd. I don't have access to the machine anymore. Mikulas -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel