> On Sep 13, 2022, at 11:01 AM, Anna Schumaker <anna@xxxxxxxxxx> wrote: > > From: Anna Schumaker <Anna.Schumaker@xxxxxxxxxx> > > When we left off with READ_PLUS, Chuck had suggested reverting the > server to reply with a single NFS4_CONTENT_DATA segment essentially > mimicing how the READ operation behaves. Then, a future sparse read > function can be added and the server modified to support it without > needing to rip out the old READ_PLUS code at the same time. > > This patch takes that first step. I was even able to re-use the > nfsd4_encode_readv() and nfsd4_encode_splice_read() functions to > remove some duuplicate code. > > Below is some performance data comparing the READ and READ_PLUS > operations with v4.2. I tested reading 2G files with various hole > lengths including 100% data, 100% hole, and a handful of mixed hole and > data files. For the mixed files, a notation like "1d" means > every-other-page is data, and the first page is data. "4h" would mean > alternating 4 pages data and 4 pages hole, beginning with hole. > > I also used the 'vmtouch' utility to make sure the file is either > evicted from the server's pagecache ("Uncached on server") or present in > the server's page cache ("Cached on server"). > > 2048M-data > :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ... 3.555 s, 712 MB/s, 0.74 s kern, 24% cpu > : :........................... Cached on server ..... 1.346 s, 1.6 GB/s, 0.69 s kern, 52% cpu > :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ... 3.596 s, 690 MB/s, 0.72 s kern, 23% cpu > :........................... Cached on server ..... 1.394 s, 1.6 GB/s, 0.67 s kern, 48% cpu > 2048M-hole > :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ... 4.934 s, 762 MB/s, 1.86 s kern, 29% cpu > : :........................... Cached on server ..... 1.328 s, 1.6 GB/s, 0.72 s kern, 54% cpu > :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ... 4.823 s, 739 MB/s, 1.88 s kern, 28% cpu > :........................... Cached on server ..... 1.399 s, 1.5 GB/s, 0.70 s kern, 50% cpu > 2048M-mixed-1d > :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ... 4.480 s, 598 MB/s, 0.76 s kern, 21% cpu > : :........................... Cached on server ..... 1.445 s, 1.5 GB/s, 0.71 s kern, 50% cpu > :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ... 4.774 s, 559 MB/s, 0.75 s kern, 19% cpu > :........................... Cached on server ..... 1.514 s, 1.4 GB/s, 0.67 s kern, 44% cpu > 2048M-mixed-1h > :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ... 3.568 s, 633 MB/s, 0.78 s kern, 23% cpu > : :........................... Cached on server ..... 1.357 s, 1.6 GB/s, 0.71 s kern, 53% cpu > :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ... 3.580 s, 641 MB/s, 0.74 s kern, 22% cpu > :........................... Cached on server ..... 1.396 s, 1.5 GB/s, 0.67 s kern, 48% cpu > 2048M-mixed-2d > :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ... 3.159 s, 708 MB/s, 0.78 s kern, 26% cpu > : :........................... Cached on server ..... 1.410 s, 1.5 GB/s, 0.70 s kern, 50% cpu > :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ... 3.093 s, 712 MB/s, 0.74 s kern, 25% cpu > :........................... Cached on server ..... 1.474 s, 1.4 GB/s, 0.67 s kern, 46% cpu > 2048M-mixed-2h > :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ... 3.043 s, 722 MB/s, 0.78 s kern, 26% cpu > : :........................... Cached on server ..... 1.374 s, 1.6 GB/s, 0.72 s kern, 53% cpu > :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ... 2.913 s, 756 MB/s, 0.74 s kern, 26% cpu > :........................... Cached on server ..... 1.349 s, 1.6 GB/s, 0.67 s kern, 50% cpu > 2048M-mixed-4d > :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ... 3.275 s, 680 MB/s, 0.75 s kern, 24% cpu > : :........................... Cached on server ..... 1.391 s, 1.5 GB/s, 0.71 s kern, 52% cpu > :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ... 3.470 s, 626 MB/s, 0.72 s kern, 21% cpu > :........................... Cached on server ..... 1.456 s, 1.5 GB/s, 0.67 s kern, 46% cpu > 2048M-mixed-4h > :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ... 3.035 s, 743 MB/s, 0.74 s kern, 26% cpu > : :........................... Cached on server ..... 1.345 s, 1.6 GB/s, 0.71 s kern, 53% cpu > :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ... 2.848 s, 779 MB/s, 0.73 s kern, 26% cpu > :........................... Cached on server ..... 1.421 s, 1.5 GB/s, 0.68 s kern, 48% cpu > 2048M-mixed-8d > :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ... 3.262 s, 687 MB/s, 0.74 s kern, 24% cpu > : :........................... Cached on server ..... 1.366 s, 1.6 GB/s, 0.69 s kern, 51% cpu > :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ... 3.195 s, 709 MB/s, 0.72 s kern, 24% cpu > :........................... Cached on server ..... 1.414 s, 1.5 GB/s, 0.67 s kern, 48% cpu > 2048M-mixed-8h > :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ... 2.899 s, 789 MB/s, 0.73 s kern, 27% cpu > : :........................... Cached on server ..... 1.338 s, 1.6 GB/s, 0.69 s kern, 52% cpu > :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ... 2.910 s, 772 MB/s, 0.72 s kern, 26% cpu > :........................... Cached on server ..... 1.438 s, 1.5 GB/s, 0.67 s kern, 47% cpu > 2048M-mixed-16d > :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ... 3.416 s, 661 MB/s, 0.73 s kern, 23% cpu > : :........................... Cached on server ..... 1.345 s, 1.6 GB/s, 0.70 s kern, 53% cpu > :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ... 3.177 s, 713 MB/s, 0.70 s kern, 23% cpu > :........................... Cached on server ..... 1.447 s, 1.5 GB/s, 0.68 s kern, 47% cpu > 2048M-mixed-16h > :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ... 2.919 s, 780 MB/s, 0.73 s kern, 26% cpu > : :........................... Cached on server ..... 1.363 s, 1.6 GB/s, 0.70 s kern, 51% cpu > :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ... 2.934 s, 773 MB/s, 0.70 s kern, 25% cpu > :........................... Cached on server ..... 1.435 s, 1.5 GB/s, 0.67 s kern, 47% cpu For this particular change, I'm interested only in cases where the whole file is cached on the server. We're focusing on the efficiency and performance of the protocol and transport here, not the underlying filesystem (which is... xfs?). Also, 2GB files can be read with just 20 1MB READ requests. That means we don't have a large sample size of READ operations for any single test, assuming the client is using 1MB rsize. Also, are these averages, or single runs? I think running each test 5-10 times (at least) and including some variance data in the results would help build more confidence that the small differences in the timing are not noise. All that said, however, I see with some consistency that READ_PLUS takes longer to pull data over the wire, but uses slightly less CPU. Assuming the CPU utilizations are client-side, that matches my expectations of lower CPU utilization results if the throughput is lower. Looking at the 100% data results, READ_PLUS takes 3.5% longer than READ. That to me is a small but significant drop -- I think it will be noticeable for large workloads. Can you explain the difference? For subsequent test runs, can you find a server with more memory, test with larger files, and test with a variety of rsize settings? You can reduce your test matrix by leaving out the tests with holey files for the moment. > - v4: > - Change READ and READ_PLUS to return nfserr_serverfault if the splice > splice check fails. At this point, the code looks fine, but I'd like to understand why the performance is not the same. > Thanks, > Anna > > > Anna Schumaker (2): > NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data > NFSD: Simplify READ_PLUS > > fs/nfsd/nfs4xdr.c | 141 +++++++++++----------------------------------- > 1 file changed, 33 insertions(+), 108 deletions(-) > > -- > 2.37.3 > -- Chuck Lever