Re: [PATCH 08/13] fs: add read support for RWF_UNCACHED

Stefan Metzmacher <metze@xxxxxxxxx> · Mon, 11 Nov 2024 14:04:13 +0100

Hi Jens,

If the same test case is run with RWF_UNCACHED set for the buffered read,
the output looks as follows:

Reading bs 65536, uncached 0
   1s: 153144MB/sec
   2s: 156760MB/sec
   3s: 158110MB/sec
   4s: 158009MB/sec
   5s: 158043MB/sec
   6s: 157638MB/sec
   7s: 157999MB/sec
   8s: 158024MB/sec
   9s: 157764MB/sec
  10s: 157477MB/sec
  11s: 157417MB/sec
  12s: 157455MB/sec
  13s: 157233MB/sec
  14s: 156692MB/sec

which is just chugging along at ~155GB/sec of read performance. Looking
at top, we see:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
7961 root      20   0  267004      0      0 S  3180   0.0   5:37.95 uncached
8024 axboe     20   0   14292   4096      0 R   1.0   0.0   0:00.13 top

where just the test app is using CPU, no reclaim is taking place outside
of the main thread. Not only is performance 65% better, it's also using
half the CPU to do it.

Do you have numbers of similar code using O_DIRECT just to
see the impact of the memcpy from the page cache to the userspace
buffer...

Thanks!
metze