From: Anna Schumaker <Anna.Schumaker@xxxxxxxxxx> These patches add client support for the READ_PLUS operation. This operation is meant to improve file reading performance when working with sparse files, but underlying filesystem performance on the server side may have an effect on the actual read performance. I've done a bunch of testing on virtual machines, and I found that READ_PLUS performs best if: 1) The file being read is not yet in the server's page cache. 2) The read request begins with a hole segment. And 3) The server only performs one llseek() call during encoding I've added a "noreadplus" mount option to allow users to disabl ethe new operation if it becomes a problem, similar to the "nordirplus" mount option that we already have. Here are the results of my performance tests, separated by underlying filesystem and if the file is already in the cache or not. The NFS v4.2 column is for the standard READ operation, and v4.2+ is with READ_PLUS. In addition to the 100% data and 100% hole cases, I also test with files that alternate between data and hole chunks. I tested with two files for each chunk size, one beginning with a data segment and one beginning with a hole. I used the `vmtouch` utility to load and clear the file from the server's cache, and I used the following `dd` command on the client for reading back the file: $ dd if=$src of=/dev/null bs=$rsize_from_mount 2>&1 xfs (uncached) | NFS v3 NFS v4.0 NFS v4.1 NFS v4.2 NFS v4.2+ ------------------------+------------------------------------------------- Whole File (data) | 3.228s 3.361s 3.679s 3.382s 3.483s Whole File (hole) | 1.276s 1.086s 1.143s 1.066s 0.805s Sparse 4K (data) | 3.473s 3.953s 3.740s 3.535s 3.515s Sparse 4K (hole) | 3.373s 3.192s 3.120s 3.113s 2.709s Sparse 8K (data) | 3.782s 3.527s 3.589s 3.476s 3.494s Sparse 8K (hole) | 3.161s 3.328s 2.974s 2.889s 2.863s Sparse 16K (data) | 3.804s 3.945s 3.885s 3.507s 3.569s Sparse 16K (hole) | 2.961s 3.124s 3.413s 3.136s 2.712s Sparse 32K (data) | 2.891s 3.632s 3.833s 3.643s 3.485s Sparse 32K (hole) | 2.592s 2.216s 2.545s 2.665s 2.829s xfs (cached) | NFS v3 NFS v4.0 NFS v4.1 NFS v4.2 NFS v4.2+ ------------------------+------------------------------------------------- Whole File (data) | 0.939s 0.943s 0.939s 0.942s 1.153s Whole File (hole) | 0.982s 1.007s 0.991s 0.946s 0.826s Sparse 4K (data) | 0.980s 0.999s 0.961s 0.996s 1.166s Sparse 4K (hole) | 1.001s 0.972s 0.997s 1.001s 1.201s Sparse 8K (data) | 1.272s 1.053s 0.999s 0.974s 1.200s Sparse 8K (hole) | 0.965s 1.004s 1.036s 1.006s 1.248s Sparse 16K (data) | 0.995s 0.993s 1.035s 1.054s 1.210s Sparse 16K (hole) | 0.966s 0.982s 1.091s 1.038s 1.214s Sparse 32K (data) | 1.054s 0.968s 1.045s 0.990s 1.203s Sparse 32K (hole) | 1.019s 0.960s 1.001s 0.983s 1.254s ext4 (uncached) | NFS v3 NFS v4.0 NFS v4.1 NFS v4.2 NFS v4.2+ ------------------------+------------------------------------------------- Whole File (data) | 6.089s 6.104s 6.489s 6.342s 6.137s Whole File (hole) | 2.603s 2.258s 2.226s 2.315s 1.715s Sparse 4K (data) | 7.063s 7.372s 7.064s 7.149s 7.459s Sparse 4K (hole) | 7.231s 6.709s 6.495s 6.880s 6.138s Sparse 8K (data) | 6.576s 6.938s 6.386s 6.086s 6.154s Sparse 8K (hole) | 5.903s 6.089s 5.555s 5.578s 5.442s Sparse 16K (data) | 6.556s 6.257s 6.135s 5.588s 5.856s Sparse 16K (hole) | 5.504s 5.290s 5.545s 5.195s 4.983s Sparse 32K (data) | 5.047s 5.490s 5.734s 5.578s 5.378s Sparse 32K (hole) | 4.232s 3.860s 4.299s 4.466s 4.633s ext4 (cached) | NFS v3 NFS v4.0 NFS v4.1 NFS v4.2 NFS v4.2+ ------------------------+------------------------------------------------- Whole File (data) | 1.873s 1.881s 1.869s 1.890s 2.344s Whole File (hole) | 1.929s 2.009s 1.963s 1.917s 1.554s Sparse 4K (data) | 1.961s 1.974s 1.957s 1.986s 2.408s Sparse 4K (hole) | 2.056s 2.025s 1.977s 1.988s 2.458s Sparse 8K (data) | 2.297s 2.038s 2.008s 1.954s 2.437s Sparse 8K (hole) | 1.939s 2.011s 2.024s 2.015s 2.509s Sparse 16K (data) | 1.907s 1.973s 2.053s 2.070s 2.411s Sparse 16K (hole) | 1.940s 1.964s 2.075s 1.996s 2.422s Sparse 32K (data) | 2.045s 1.921s 2.021s 2.013s 2.388s Sparse 32K (hole) | 1.984s 1.944s 1.997s 1.974s 2.398s btrfs (uncached) | NFS v3 NFS v4.0 NFS v4.1 NFS v4.2 NFS v4.2+ ------------------------+------------------------------------------------- Whole File (data) | 9.369s 9.438s 9.837s 9.840s 11.790s Whole File (hole) | 4.052s 3.390s 3.380s 3.619s 2.519s Sparse 4K (data) | 9.738s 10.110s 9.774s 9.819s 12.471s Sparse 4K (hole) | 9.907s 9.504s 9.241s 9.610s 9.054s Sparse 8K (data) | 9.132s 9.453s 8.954s 8.660s 10.555s Sparse 8K (hole) | 8.290s 8.489s 8.305s 8.332s 7.850s Sparse 16K (data) | 8.742s 8.507s 8.667s 8.002s 9.940s Sparse 16K (hole) | 7.635s 7.604s 7.967s 7.558s 7.062s Sparse 32K (data) | 7.279s 7.670s 8.006s 7.705s 9.219s Sparse 32K (hole) | 6.200s 5.713s 6.268s 6.464s 6.486s btrfs (cached) | NFS v3 NFS v4.0 NFS v4.1 NFS v4.2 NFS v4.2+ ------------------------+------------------------------------------------- Whole File (data) | 2.770s 2.814s 2.841s 2.854s 3.492s Whole File (hole) | 2.871s 2.970s 3.001s 2.929s 2.372s Sparse 4K (data) | 2.945s 2.905s 2.930s 2.951s 3.663s Sparse 4K (hole) | 3.032s 3.057s 2.962s 3.050s 3.705s Sparse 8K (data) | 3.277s 3.069s 3.127s 3.034s 3.652s Sparse 8K (hole) | 2.866s 2.959s 3.078s 2.989s 3.762s Sparse 16K (data) | 2.916s 2.923s 3.060s 3.081s 3.631s Sparse 16K (hole) | 2.948s 2.969s 3.108s 2.990s 3.623s Sparse 32K (data) | 3.044s 2.881s 3.052s 2.962s 3.585s Sparse 32K (hole) | 2.954s 2.957s 3.018s 2.951s 3.639s I also have performance numbers for if we encode every hole and data segment but I figured this email was long enough already. I'm happy to share it if requested! Thoughts? Anna ------------------------------------------------------------------------- Anna Schumaker (6): SUNRPC: Split out a function for setting current page SUNRPC: Add the ability to expand holes in data pages SUNRPC: Add the ability to shift data to a specific offset NFS: Add basic READ_PLUS support NFS: Add support for decoding multiple segments NFS: Add a mount option for READ_PLUS fs/nfs/nfs42xdr.c | 164 +++++++++++++++++++++++++ fs/nfs/nfs4client.c | 3 + fs/nfs/nfs4proc.c | 32 ++++- fs/nfs/nfs4xdr.c | 1 + fs/nfs/super.c | 21 ++++ include/linux/nfs4.h | 3 +- include/linux/nfs_fs_sb.h | 2 + include/linux/nfs_xdr.h | 2 +- include/linux/sunrpc/xdr.h | 2 + net/sunrpc/xdr.c | 244 ++++++++++++++++++++++++++++++++++++- 10 files changed, 467 insertions(+), 7 deletions(-) -- 2.20.1