On Sun, Apr 19, 2020 at 10:49 AM David Laight <David.Laight@xxxxxxxxxx> wrote: > > From: Mikulas Patocka > > Sent: 18 April 2020 16:21 > > > > On Sat, 18 Apr 2020, David Laight wrote: > > > > > From: Mikulas Patocka > > > > Sent: 17 April 2020 13:47 > > > ... > > > > Index: linux-2.6/drivers/md/dm-writecache.c > > > > =================================================================== > > > > --- linux-2.6.orig/drivers/md/dm-writecache.c 2020-04-17 14:06:35.139999000 +0200 > > > > +++ linux-2.6/drivers/md/dm-writecache.c 2020-04-17 14:06:35.129999000 +0200 > > > > @@ -1166,7 +1166,10 @@ static void bio_copy_block(struct dm_wri > > > > } > > > > } else { > > > > flush_dcache_page(bio_page(bio)); > > > > - memcpy_flushcache(data, buf, size); > > > > + if (likely(size > 512)) > > > > + memcpy_flushcache_clflushopt(data, buf, size); > > > > + else > > > > + memcpy_flushcache(data, buf, size); > > > > > > Hmmm... have you looked at how long clflush actually takes? > > > It isn't too bad if you just do a small number, but using it > > > to flush large buffers can be very slow. > > > > Yes, I have. It's here: > > http://people.redhat.com/~mpatocka/testcases/pmem/microbenchmarks/pmem.txt > > > > sequential write 8 + clflush - 0.3 GB/s on nvdimm > > sequential write 8 + clflushopt - 1.6 GB/s on nvdimm > > sequential write-nt 8 bytes - 1.3 GB/s on nvdimm > > That table doesn't give enough information to be useful. > The cpu speed, memory speed and transfer lengths are all relevant. > > > > I've an Ivy bridge system where the X-server process requests the > > > frame buffer be flushed out every 10 seconds (no idea why). > > > With my 2560x1440 monitor this takes over 3ms. > > > > > > This really needs a cond_resched() every few clflush instructions. > > > > > > David > > > > AFAIK Ivy Bridge doesn't have clflushopt, it only has clflush. clflush > > only allows one outstanding cacle line flush, so it's very slow. > > clflushopt and clwb relaxed this restriction and there can be multiple > > cache-invalidation requests in flight until the user serializes it with > > the sfence instruction. > > It isn't that simple. > While clflush on Ivybridge is slower than clflushopt on newer processors > both instructions are (relatively) fast for something like 16 or 32 > iterations. After that they get much slower. > I can't remember where I found the relevant figures, even the ones I > found didn't show how large the transfers needed to be before the bytes/sec > became constant. > > > The patch checks for clflushopt with > > "static_cpu_has(X86_FEATURE_CLFLUSHOPT)" and if it is not present, it > > falls back to non-temporal stores. > > Ok, I was expecting you'd be falling back to clflush first. clflush is a serializing instruction, clflushopt and non-temporal stores are not. -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel