Hi- I’m looking into NFS WRITE latency on the Linux NFS server. With a high speed network fabric and ultra-fast storage (think persistent memory) we can get the per-WRITE latency under 50us (as observed on the wire at the server). One source of the remaining latency appears to be nfsd_open. Here is an example of a call graph captured during a simple iozone 4KB direct write workload (v4.4-rc3): 2) | nfsd_open [nfsd]() { 2) | fh_verify [nfsd]() { 2) | rqst_exp_find [nfsd]() { 2) | exp_find [nfsd]() { 2) | exp_find_key [nfsd]() { 2) 0.043 us | svc_expkey_hash [nfsd](); 2) | sunrpc_cache_lookup [sunrpc]() { 2) 0.027 us | _raw_read_lock(); 2) | getboottime64() { 2) 0.022 us | ns_to_timespec(); 2) 0.210 us | } 2) 0.021 us | get_seconds(); 2) 0.854 us | } 2) | cache_check [sunrpc]() { 2) | getboottime64() { 2) 0.021 us | ns_to_timespec(); 2) 0.206 us | } 2) 0.022 us | get_seconds(); 2) 0.596 us | } 2) 2.050 us | } 2) | exp_get_by_name [nfsd]() { 2) | svc_export_lookup [nfsd]() { 2) | sunrpc_cache_lookup [sunrpc]() { 2) 0.022 us | _raw_read_lock(); 2) 0.024 us | svc_export_match [nfsd](); 2) | getboottime64() { 2) 0.023 us | ns_to_timespec(); 2) 0.210 us | } 2) 0.022 us | get_seconds(); 2) 0.997 us | } 2) 1.181 us | } 2) | cache_check [sunrpc]() { 2) | getboottime64() { 2) 0.022 us | ns_to_timespec(); 2) 0.210 us | } 2) 0.022 us | get_seconds(); 2) 0.592 us | } 2) 2.147 us | } 2) 4.560 us | } 2) 4.768 us | } 2) | prepare_creds() { 2) | kmem_cache_alloc() { 2) 0.021 us | _cond_resched(); 2) 0.233 us | } 2) | security_prepare_creds() { 2) | selinux_cred_prepare() { 2) | kmemdup() { 2) | __kmalloc_track_caller() { 2) 0.022 us | kmalloc_slab(); 2) 0.023 us | _cond_resched(); 2) 0.448 us | } 2) 0.640 us | } 2) 0.838 us | } 2) 1.129 us | } 2) 1.808 us | } 2) 0.022 us | override_creds(); 2) | exportfs_decode_fh() { 2) | shmem_fh_to_dentry() { 2) | ilookup5() { 2) | ilookup5_nowait() { 2) 0.026 us | _raw_spin_lock(); 2) | find_inode() { 2) 0.031 us | shmem_match(); 2) 0.029 us | _raw_spin_lock(); 2) 0.429 us | } 2) 0.826 us | } 2) 0.022 us | _cond_resched(); 2) 0.021 us | _cond_resched(); 2) 1.417 us | } 2) | d_find_alias() { 2) 0.022 us | _raw_spin_lock(); 2) 0.027 us | _raw_spin_lock(); 2) 0.423 us | } 2) 0.035 us | iput(); 2) 2.410 us | } 2) | find_acceptable_alias() { 2) 0.026 us | nfsd_acceptable [nfsd](); 2) 0.220 us | } 2) 3.032 us | } 2) | nfsd_setuser_and_check_port [nfsd]() { 2) 0.023 us | nfsexp_flags [nfsd](); 2) | nfsd_setuser [nfsd]() { 2) | revert_creds() { 2) | __put_cred() { 2) | call_rcu_sched() { 2) | __call_rcu() { 2) 0.049 us | __call_rcu_nocb_enqueue(); 2) 0.265 us | } 2) 0.446 us | } 2) 0.661 us | } 2) 0.854 us | } 2) | prepare_creds() { 2) | kmem_cache_alloc() { 2) 0.022 us | _cond_resched(); 2) 0.219 us | } 2) | security_prepare_creds() { 2) | selinux_cred_prepare() { 2) | kmemdup() { 2) | __kmalloc_track_caller() { 2) 0.024 us | kmalloc_slab(); 2) 0.022 us | _cond_resched(); 2) 0.412 us | } 2) 0.615 us | } 2) 0.801 us | } 2) 0.993 us | } 2) 1.651 us | } 2) 0.033 us | set_groups(); 2) 0.023 us | override_creds(); 2) 3.323 us | } 2) 0.029 us | nfserrno [nfsd](); 2) 3.919 us | } 2) 0.025 us | check_nfsd_access [nfsd](); 2) 0.022 us | nfsd_permission [nfsd](); 2) + 14.904 us | } So, 15us for fh_verify by itself. 2) | dentry_open() { 2) | get_empty_filp() { 2) | kmem_cache_alloc() { 2) 0.021 us | _cond_resched(); 2) 0.325 us | } 2) | security_file_alloc() { 2) | selinux_file_alloc_security() { 2) | kmem_cache_alloc() { 2) 0.022 us | _cond_resched(); 2) 0.251 us | } 2) 0.441 us | } 2) 0.633 us | } 2) 0.022 us | __mutex_init(); 2) 1.572 us | } 2) | vfs_open() { 2) | do_dentry_open() { 2) | path_get() { 2) 0.025 us | mntget(); 2) 0.313 us | } 2) 0.029 us | try_module_get(); 2) | security_file_open() { 2) | selinux_file_open() { 2) 0.024 us | avc_policy_seqno(); 2) | inode_has_perm.isra.21() { 2) 0.148 us | avc_has_perm(); 2) 0.332 us | } 2) 0.728 us | } 2) 0.023 us | __fsnotify_parent(); 2) 0.034 us | fsnotify(); 2) 1.359 us | } 2) 0.045 us | file_ra_state_init(); 2) 2.548 us | } 2) 2.733 us | } 2) 4.667 us | } 2) | ima_file_check() { 2) 0.028 us | process_measurement(); 2) 0.211 us | } 2) 0.023 us | nfserrno [nfsd](); 2) + 20.549 us | } And that’s 20us total for this nfsd_open. Seems like a lot of work for each NFSv3 WRITE operation, and plenty of opportunities for lock contention if my server happened to be otherwise busy. Note these figures are a little inflated because ftrace itself adds some latency. Anyway, making fh_verify faster could have a significant impact on per-op latency with fast storage. The recently-proposed NFSD open cache by itself could eliminate the need to perform fh_verify so often, for example. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html