> On Sep 5, 2018, at 4:36 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Wed, 2018-09-05 at 15:33 -0400, Chuck Lever wrote: >>> On Sep 5, 2018, at 3:23 PM, Trond Myklebust <trondmy@xxxxxxxxx> >>> wrote: >>> >>> Fallout from a bunch of flame graphs... >> >> Hey, are these in a public git repo I can pull from? >> > > I've pushed out my 'testing' branch to git.linux-nfs.org. The usual > caveats apply: please do not treat that branch as being stable, assume > it won't be rebased or assume that I won't change the contents. It is > just there for testing purposes. > > Cheers > Trond > >> >>> Trond Myklebust (7): >>> pNFS: Don't zero out the array in nfs4_alloc_pages() >>> pNFS: Don't allocate more pages than we need to fit a layoutget >>> response >>> NFS: Convert lookups of the lock context to RCU >>> NFS: Simplify internal check for whether file is open for write >>> NFS: Convert lookups of the open context to RCU >>> NFSv4: Convert open state lookup to use RCU >>> NFSv4: Convert struct nfs4_state to use refcount_t >>> >>> fs/nfs/delegation.c | 11 ++-- >>> fs/nfs/filelayout/filelayout.c | 1 + >>> fs/nfs/flexfilelayout/flexfilelayout.c | 1 + >>> fs/nfs/inode.c | 70 +++++++++++---------- >>> ----- >>> fs/nfs/nfs4_fs.h | 3 +- >>> fs/nfs/nfs4proc.c | 38 ++++++++++---- >>> fs/nfs/nfs4state.c | 32 ++++++------ >>> fs/nfs/pnfs.c | 16 ++++-- >>> fs/nfs/pnfs.h | 1 + >>> include/linux/nfs_fs.h | 2 + >>> 10 files changed, 98 insertions(+), 77 deletions(-) >>> >>> -- >>> 2.17.1 Some performance testing results for the full "testing" series. The fio tests are designed to push the IOPS rate, and the third is a QD=1 test to measure the latency of 512KB NFS WRITE operations. All three tests use direct I/O. The "without fair queuing" kernels have this commit reverted: commit ae03d238e8a11ddc76668c64ad405cd8412446a6 Author: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> AuthorDate: Tue Sep 4 11:47:51 2018 -0400 Commit: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> CommitDate: Wed Sep 5 14:37:07 2018 -0400 SUNRPC: Queue fairness for all. Client: 12-core, two-socket, 56Gb InfiniBand Server: 4-core, one-socket, 56Gb InfiniBand, tmpfs export Test: /usr/bin/fio --size=1G --direct=1 --rw=randrw --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=8k --rwmixread=70 --iodepth=16 --numjobs=16 --runtime=240 --group_reporting NFSv3 on RDMA: Stock v4.19-rc2: • read: IOPS=109k, BW=849MiB/s (890MB/s)(11.2GiB/13506msec) • write: IOPS=46.6k, BW=364MiB/s (382MB/s)(4915MiB/13506msec) Trond's kernel (with fair queuing): • read: IOPS=83.0k, BW=649MiB/s (680MB/s)(11.2GiB/17676msec) • write: IOPS=35.6k, BW=278MiB/s (292MB/s)(4921MiB/17676msec) Trond's kernel (without fair queuing): • read: IOPS=90.5k, BW=707MiB/s (742MB/s)(11.2GiB/16216msec) • write: IOPS=38.8k, BW=303MiB/s (318MB/s)(4917MiB/16216msec) NFSv3 on TCP (IPoIB): Stock v4.19-rc2: • read: IOPS=23.8k, BW=186MiB/s (195MB/s)(11.2GiB/61635msec) • write: IOPS=10.2k, BW=79.9MiB/s (83.8MB/s)(4923MiB/61635msec) Trond's kernel (with fair queuing): • read: IOPS=25.9k, BW=202MiB/s (212MB/s)(11.2GiB/56710msec) • write: IOPS=11.1k, BW=86.7MiB/s (90.9MB/s)(4916MiB/56710msec) Trond's kernel (without fair queuing): • read: IOPS=25.0k, BW=203MiB/s (213MB/s)(11.2GiB/56492msec) • write: IOPS=11.1k, BW=86.0MiB/s (91.2MB/s)(4915MiB/56492msec) Test: /usr/bin/fio --size=1G --direct=1 --rw=randread --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=4k --rwmixread=100 --iodepth=1024 --numjobs=16 --runtime=240 --group_reporting NFSv3 on RDMA: Stock v4.19-rc2: • read: IOPS=149k, BW=580MiB/s (608MB/s)(16.0GiB/28241msec) Trond's kernel (with fair queuing): • read: IOPS=81.5k, BW=318MiB/s (334MB/s)(16.0GiB/51450msec) Trond's kernel (without fair queuing): • read: IOPS=82.4k, BW=322MiB/s (337MB/s)(16.0GiB/50918msec) NFSv3 on TCP (IPoIB): Stock v4.19-rc2: • read: IOPS=37.2k, BW=145MiB/s (153MB/s)(16.0GiB/112630msec) Trond's kernel (with fair queuing): • read: IOPS=2715, BW=10.6MiB/s (11.1MB/s)(2573MiB/242594msec) Trond's kernel (without fair queuing): • read: IOPS=2869, BW=11.2MiB/s (11.8MB/s)(2724MiB/242979msec) Test: /home/cel/bin/iozone -M -i0 -s8g -r512k -az -I -N My kernel: 4.19.0-rc2-00026-g50d68a4 system call latencies in microseconds, N=5: • write: mean=602, std=13.0 • rewrite: mean=541, std=17.3 server round trip latency in microseconds, N=5: • RTT: mean=354, std=3.0 Trond's kernel (with fair queuing): system call latencies in microseconds, N=5: • write: mean=572, std=10.6 • rewrite: mean=533, std=7.9 server round trip latency in microseconds, N=5: • RTT: mean=352, std=2.7 -- Chuck Lever