Re: [RFC] what to do with IOCB_DSYNC?

Jens Axboe <axboe@xxxxxxxxx> · Mon, 23 May 2022 09:44:12 -0600

On 5/23/22 9:12 AM, Jens Axboe wrote:
>> Current branch pushed to #new.iov_iter (at the moment; will rename
>> back to work.iov_iter once it gets more or less stable).
> 
> Sounds good, I'll see what I need to rebase.

On the previous branch, ran a few quick numbers. dd from /dev/zero to
/dev/null, with /dev/zero using ->read() as it does by default:

32      260MB/sec
1k      6.6GB/sec
4k      17.9GB/sec
16k     28.8GB/sec

now comment out ->read() so it uses ->read_iter() instead:

32      259MB/sec
1k      6.6GB/sec
4k      18.0GB/sec
16k	28.6GB/sec

which are roughly identical, all things considered. Just a sanity check,
but looks good from a performance POV in this basic test.

Now let's do ->read_iter() but make iov_iter_zero() copy from the zero
page instead:

32      250MB/sec
1k      7.7GB/sec
4k      28.8GB/sec
16k	71.2GB/sec

Looks like it's a tad slower for 32-bytes, considerably better for 1k,
and massively better at page size and above. This is on an Intel 12900K,
so recent CPU. Let's try cacheline and above:

Size	Method			BW		
64	copy_from_zero()	508MB/sec
128	copy_from_zero()	1.0GB/sec
64	clear_user()		513MB/sec
128	clear_user()		1.0GB/sec

Something like the below may make sense to do, the wins at bigger sizes
is substantial and that gets me the best of both worlds. If we really
care, we could move the check earlier and not have it per-segment. I
doubt it matters in practice, though.

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index e93fcfcf2176..f4b80ef446b9 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1049,12 +1049,19 @@ static size_t pipe_zero(size_t bytes, struct iov_iter *i)
 	return bytes;
 }
 
+static unsigned long copy_from_zero(void __user *buf, size_t len)
+{
+	if (len >= 128)
+		return copy_to_user(buf, page_address(ZERO_PAGE(0)), len);
+	return clear_user(buf, len);
+}
+
 size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
 {
 	if (unlikely(iov_iter_is_pipe(i)))
 		return pipe_zero(bytes, i);
 	iterate_and_advance(i, bytes, base, len, count,
-		clear_user(base, len),
+		copy_from_zero(base, len),
 		memset(base, 0, len)
 	)
 

-- 
Jens Axboe