On Tue, Sep 13, 2022 at 2:45 PM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: > > > > > On Sep 13, 2022, at 11:01 AM, Anna Schumaker <anna@xxxxxxxxxx> wrote: > > > > From: Anna Schumaker <Anna.Schumaker@xxxxxxxxxx> > > > > When we left off with READ_PLUS, Chuck had suggested reverting the > > server to reply with a single NFS4_CONTENT_DATA segment essentially > > mimicing how the READ operation behaves. Then, a future sparse read > > function can be added and the server modified to support it without > > needing to rip out the old READ_PLUS code at the same time. > > > > This patch takes that first step. I was even able to re-use the > > nfsd4_encode_readv() and nfsd4_encode_splice_read() functions to > > remove some duuplicate code. > > > > Below is some performance data comparing the READ and READ_PLUS > > operations with v4.2. I tested reading 2G files with various hole > > lengths including 100% data, 100% hole, and a handful of mixed hole and > > data files. For the mixed files, a notation like "1d" means > > every-other-page is data, and the first page is data. "4h" would mean > > alternating 4 pages data and 4 pages hole, beginning with hole. > > > > I also used the 'vmtouch' utility to make sure the file is either > > evicted from the server's pagecache ("Uncached on server") or present in > > the server's page cache ("Cached on server"). > > > > 2048M-data > > :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ... 3.555 s, 712 MB/s, 0.74 s kern, 24% cpu > > : :........................... Cached on server ..... 1.346 s, 1.6 GB/s, 0.69 s kern, 52% cpu > > :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ... 3.596 s, 690 MB/s, 0.72 s kern, 23% cpu > > :........................... Cached on server ..... 1.394 s, 1.6 GB/s, 0.67 s kern, 48% cpu > > 2048M-hole > > :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ... 4.934 s, 762 MB/s, 1.86 s kern, 29% cpu > > : :........................... Cached on server ..... 1.328 s, 1.6 GB/s, 0.72 s kern, 54% cpu > > :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ... 4.823 s, 739 MB/s, 1.88 s kern, 28% cpu > > :........................... Cached on server ..... 1.399 s, 1.5 GB/s, 0.70 s kern, 50% cpu > > 2048M-mixed-1d > > :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ... 4.480 s, 598 MB/s, 0.76 s kern, 21% cpu > > : :........................... Cached on server ..... 1.445 s, 1.5 GB/s, 0.71 s kern, 50% cpu > > :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ... 4.774 s, 559 MB/s, 0.75 s kern, 19% cpu > > :........................... Cached on server ..... 1.514 s, 1.4 GB/s, 0.67 s kern, 44% cpu > > 2048M-mixed-1h > > :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ... 3.568 s, 633 MB/s, 0.78 s kern, 23% cpu > > : :........................... Cached on server ..... 1.357 s, 1.6 GB/s, 0.71 s kern, 53% cpu > > :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ... 3.580 s, 641 MB/s, 0.74 s kern, 22% cpu > > :........................... Cached on server ..... 1.396 s, 1.5 GB/s, 0.67 s kern, 48% cpu > > 2048M-mixed-2d > > :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ... 3.159 s, 708 MB/s, 0.78 s kern, 26% cpu > > : :........................... Cached on server ..... 1.410 s, 1.5 GB/s, 0.70 s kern, 50% cpu > > :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ... 3.093 s, 712 MB/s, 0.74 s kern, 25% cpu > > :........................... Cached on server ..... 1.474 s, 1.4 GB/s, 0.67 s kern, 46% cpu > > 2048M-mixed-2h > > :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ... 3.043 s, 722 MB/s, 0.78 s kern, 26% cpu > > : :........................... Cached on server ..... 1.374 s, 1.6 GB/s, 0.72 s kern, 53% cpu > > :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ... 2.913 s, 756 MB/s, 0.74 s kern, 26% cpu > > :........................... Cached on server ..... 1.349 s, 1.6 GB/s, 0.67 s kern, 50% cpu > > 2048M-mixed-4d > > :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ... 3.275 s, 680 MB/s, 0.75 s kern, 24% cpu > > : :........................... Cached on server ..... 1.391 s, 1.5 GB/s, 0.71 s kern, 52% cpu > > :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ... 3.470 s, 626 MB/s, 0.72 s kern, 21% cpu > > :........................... Cached on server ..... 1.456 s, 1.5 GB/s, 0.67 s kern, 46% cpu > > 2048M-mixed-4h > > :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ... 3.035 s, 743 MB/s, 0.74 s kern, 26% cpu > > : :........................... Cached on server ..... 1.345 s, 1.6 GB/s, 0.71 s kern, 53% cpu > > :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ... 2.848 s, 779 MB/s, 0.73 s kern, 26% cpu > > :........................... Cached on server ..... 1.421 s, 1.5 GB/s, 0.68 s kern, 48% cpu > > 2048M-mixed-8d > > :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ... 3.262 s, 687 MB/s, 0.74 s kern, 24% cpu > > : :........................... Cached on server ..... 1.366 s, 1.6 GB/s, 0.69 s kern, 51% cpu > > :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ... 3.195 s, 709 MB/s, 0.72 s kern, 24% cpu > > :........................... Cached on server ..... 1.414 s, 1.5 GB/s, 0.67 s kern, 48% cpu > > 2048M-mixed-8h > > :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ... 2.899 s, 789 MB/s, 0.73 s kern, 27% cpu > > : :........................... Cached on server ..... 1.338 s, 1.6 GB/s, 0.69 s kern, 52% cpu > > :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ... 2.910 s, 772 MB/s, 0.72 s kern, 26% cpu > > :........................... Cached on server ..... 1.438 s, 1.5 GB/s, 0.67 s kern, 47% cpu > > 2048M-mixed-16d > > :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ... 3.416 s, 661 MB/s, 0.73 s kern, 23% cpu > > : :........................... Cached on server ..... 1.345 s, 1.6 GB/s, 0.70 s kern, 53% cpu > > :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ... 3.177 s, 713 MB/s, 0.70 s kern, 23% cpu > > :........................... Cached on server ..... 1.447 s, 1.5 GB/s, 0.68 s kern, 47% cpu > > 2048M-mixed-16h > > :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ... 2.919 s, 780 MB/s, 0.73 s kern, 26% cpu > > : :........................... Cached on server ..... 1.363 s, 1.6 GB/s, 0.70 s kern, 51% cpu > > :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ... 2.934 s, 773 MB/s, 0.70 s kern, 25% cpu > > :........................... Cached on server ..... 1.435 s, 1.5 GB/s, 0.67 s kern, 47% cpu > > For this particular change, I'm interested only in cases where the > whole file is cached on the server. We're focusing on the efficiency > and performance of the protocol and transport here, not the underlying > filesystem (which is... xfs?). Sounds good, I can narrow down to just that test. > > Also, 2GB files can be read with just 20 1MB READ requests. That > means we don't have a large sample size of READ operations for any > single test, assuming the client is using 1MB rsize. > > Also, are these averages, or single runs? I think running each test > 5-10 times (at least) and including some variance data in the results > would help build more confidence that the small differences in the > timing are not noise. This is an average across 10 runs. > > All that said, however, I see with some consistency that READ_PLUS > takes longer to pull data over the wire, but uses slightly less CPU. > Assuming the CPU utilizations are client-side, that matches my > expectations of lower CPU utilization results if the throughput is > lower. > > Looking at the 100% data results, READ_PLUS takes 3.5% longer than > READ. That to me is a small but significant drop -- I think it will > be noticeable for large workloads. Can you explain the difference? I'll try larger files for my next round of testing. I was assuming the difference is just noise, since there are cases like the mixed-2h test where READ_PLUS was slightly faster. But more testing will help figure that out. > > For subsequent test runs, can you find a server with more memory, > test with larger files, and test with a variety of rsize settings? > You can reduce your test matrix by leaving out the tests with holey > files for the moment. Sure thing! Anna > > > > - v4: > > - Change READ and READ_PLUS to return nfserr_serverfault if the splice > > splice check fails. > > At this point, the code looks fine, but I'd like to understand why > the performance is not the same. > > > > Thanks, > > Anna > > > > > > Anna Schumaker (2): > > NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data > > NFSD: Simplify READ_PLUS > > > > fs/nfsd/nfs4xdr.c | 141 +++++++++++----------------------------------- > > 1 file changed, 33 insertions(+), 108 deletions(-) > > > > -- > > 2.37.3 > > > > -- > Chuck Lever > > >