On Sat, 18 Apr 2020, David Laight wrote: > From: Mikulas Patocka > > Sent: 17 April 2020 13:47 > ... > > Index: linux-2.6/drivers/md/dm-writecache.c > > =================================================================== > > --- linux-2.6.orig/drivers/md/dm-writecache.c 2020-04-17 14:06:35.139999000 +0200 > > +++ linux-2.6/drivers/md/dm-writecache.c 2020-04-17 14:06:35.129999000 +0200 > > @@ -1166,7 +1166,10 @@ static void bio_copy_block(struct dm_wri > > } > > } else { > > flush_dcache_page(bio_page(bio)); > > - memcpy_flushcache(data, buf, size); > > + if (likely(size > 512)) > > + memcpy_flushcache_clflushopt(data, buf, size); > > + else > > + memcpy_flushcache(data, buf, size); > > Hmmm... have you looked at how long clflush actually takes? > It isn't too bad if you just do a small number, but using it > to flush large buffers can be very slow. Yes, I have. It's here: http://people.redhat.com/~mpatocka/testcases/pmem/microbenchmarks/pmem.txt sequential write 8 + clflush - 0.3 GB/s on nvdimm sequential write 8 + clflushopt - 1.6 GB/s on nvdimm sequential write-nt 8 bytes - 1.3 GB/s on nvdimm > I've an Ivy bridge system where the X-server process requests the > frame buffer be flushed out every 10 seconds (no idea why). > With my 2560x1440 monitor this takes over 3ms. > > This really needs a cond_resched() every few clflush instructions. > > David AFAIK Ivy Bridge doesn't have clflushopt, it only has clflush. clflush only allows one outstanding cacle line flush, so it's very slow. clflushopt and clwb relaxed this restriction and there can be multiple cache-invalidation requests in flight until the user serializes it with the sfence instruction. The patch checks for clflushopt with "static_cpu_has(X86_FEATURE_CLFLUSHOPT)" and if it is not present, it falls back to non-temporal stores. Mikulas -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel