On Tue, Feb 13, 2018 at 2:00 PM, Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote: > > > On Fri, 8 Dec 2017, Dan Williams wrote: > >> > > > when we write to >> > > > persistent memory using cached write instructions and use dax_flush >> > > > afterwards to flush cache for the affected range, the performance is about >> > > > 350MB/s. It is practically unusable - worse than low-end SSDs. >> > > > >> > > > On the other hand, the movnti instruction can sustain performance of one >> > > > 8-byte write per clock cycle. We don't have to flush cache afterwards, the >> > > > only thing that must be done is to flush the write-combining buffer with >> > > > the sfence instruction. Movnti has much better throughput than dax_flush. >> > > >> > > What about memcpy_flushcache? >> > >> > but >> > >> > - using memcpy_flushcache is overkill if we need just one or two 8-byte >> > writes to the metadata area. Why not use movnti directly? >> > >> >> The driver performs so many 8-byte moves that the cost of the >> memcpy_flushcache() function call significantly eats into your >> performance? > > I've measured it on Skylake i7-6700 - and the dm-writecache driver has 2% > lower throughput when it uses memcpy_flushcache() to update it metadata > instead of explicitly coded "movnti" instructions. > > I've created this patch - it doesn't change API in any way, but it > optimizes memcpy_flushcache for 4, 8 and 16-byte writes (that is what my > driver mostly uses). With this patch, I can remove the explicit "asm" > statements from my driver. Would you consider commiting this patch to the > kernel? > > Mikulas > > Yes, this looks good to me. You can send it to the x86 folks with my: Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx> ...or let me know and I can chase it through the -tip tree. Either way works for me. -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel