Delay before NFS list/create file operations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry if this is not the correct way to ask for help, however I want
to avoid people trying to help me "tune" my nfs mount settings and
figured that taking my problem to the actual devs may get to the heart
of my issue faster.

Running a custom Linux 4.14.132 kernel and nfs-common 1.3.4-2.1 on a
Debian Stretch VCenter VM using NFSv3 over UDP to connect to a NetApp
filer via a 10Gbps link.  We have several identical servers, all
accessing the same NFS export/qtree, all showing the same symptoms.
There are 1.8M files in that directory (a lot, but everything works
fine, until it doesn't).

Essentially, the issue is that after a few hours of moderate to heavy
use we begin to notice a delay prior to most NFS operations that don't
target a specific file.  It literally appears that every call to
create or list files on NFS is being delayed, and that delay grows to
>30s over a few hours and never goes away until we umount+mount or
reboot.

To show the difference, I mounted the same NFS export a second time
from an affected server and timed writing a 1GB file to each of the
mounts.  On the fresh mount (/mnt/nfs) I see about what you would
expect (dd says it wrote everything in 9.65s and the total execution
time was 9.7s):

# time dd if=/dev/zero of=/mnt/nfs/testfile1 bs=1G count=1

1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 9.65657 s, 111 MB/s

real    0m9.719s
user    0m0.000s
sys     0m1.284s

However when performing the same operation from the affected mount, we
see that the total execution time was almost 6X longer (54s) than the
actual write operation (still only ~9s).  It literally sat there doing
nothing for 45 seconds before any writing occurred:

# time dd if=/dev/zero of=/opt/rpath/testfile2 bs=1G count=1

1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 9.17179 s, 117 MB/s

 real    0m54.226s
user    0m0.001s
sys     0m1.238s

In addition to creating files, a simple "ls -1f" will wait 30+ seconds
before displaying the first ~500 filenames, then waits 30+ seconds
again before then next ~500 and so on.

Any operation that targets an explicit filename does not seem to be
affected and all operations on a fresh mount perform exactly as you
would expect.

The mountstats and nfsiostat commands also show that NFS is behaving
normally; all statistics are normal, fast and timely... yet every
affected operation has a delay before it actually starts
reading/writing from NFS.

I have spent hours on the phone with netapp and we are planning to do
packet captures, but we are fairly confident that the issue is client
side.

I don't believe that the issue is in NFS, or I should see an impact to
the statistics, which leaves something in the kernel feeding the
requests to it, but I have no idea where to look to pinpoint the
delay.  Unfortunately these systems do not have strace or many other
tools installed, so I am fairly limited in what I can do to
troubleshoot client side issues.

I hoped that someone on this list may have seen this issue before or
otherwise have a good place to start looking.

I am not a member of the mailing list, so I would appreciate responses
to be sent directly to me joe@xxxxxxxxxxx.  http://vger.kernel.org
appears to be down, so I am not sure how to join.

Thanks for any direction you can provide!


Joe



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux