Hi, last September I reported some NFS client hangs related to NFSv4.0 and Kerberos we are seeing on our system to this list [1]. Unfortunately, the issue is still there and seems to affect Kernels 3.x up to 4.17-rc1. We are observing this problem in production (on our workstations and compute cluster nodes), certain lseek-write-fsync patterns seem to trigger the problem. Affected processes hang indefinitely (and uninterruptibly). I have written a demo program which performs the lseek-write-fsync cycle and reliably demonstrates the hangs on our systems. To (hopefully) make reproducing the issue easier for people who know how to debug/fix the issue, I have written a little script, see [2], which sets up two virtual machines and configures them the required way (NFSv4.0 export with Kerberos): $ ./build.sh # set up the two VMs using debootstrap $ ./server.sh # start the NFS server in QEMU $ ./client.sh # start the NFS client in QEMU Then, log into the client (as "user" with password "pass"), call "kinit" (entering "pass" again) and run $ /hang.sh Then, the process should hang rather quickly ("INFO: task writesync:1443 blocked for more than 120 seconds.") and the system cannot recover from that state (kernel messages see [1]). In recent kernels (e.g., 4.17-rc1), it seems that, at least sometimes, instead of a hanging task one gets a kernel panic after the OOM killer has killed all processes. I couldn't "git bisect" the problem because I was unable to find a kernel not affected by the problem (the oldest kernel I could try was 3.16). We observe the hangs only when NFSv4.0 (not 4.1 or 4.2) is used, Kerberos is used (sec=krb5 or sec=krb5i or sec=krb5p; it seems that sec=krb5p is most likely to show the behavior) and the client and server are fast enough, i.e., slowing down either the client or the server (putting more load on the server, running the client VM without -enable-kvm, etc.) makes the hangs go away. On our production systems, the NFS server is a Nexenta system, so it seems to be independent from the NFS server. When the server is quite busy, he hangs occur seldom; when the server has a low load, the hangs on the clients happen much more often. I hope somebody has an idea how to eliminate this problem? Regards, Armin [1] https://marc.info/?l=linux-nfs&m=150620442017672 [2] https://gitlab.infosun.fim.uni-passau.de/groessli/nfs-krb5-vms -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html