On Tue, Jun 06 2017, Lutz Vieweg wrote: > On 07/29/2016 07:52 PM, Jeff Layton wrote: >>>>>>>>> fcntl(7, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, >>>>>>>>> start=1073741824, len=1}) = -1 EIO (Input/output error) >>> >>> Unfortunately I did not manage to perform a network capture last time >>> due to power loss. I did not hit this issue again until yesterday (~9 >>> months later), this time after 45 days of uptime. >>> >>> Kernel versions now are: 4.5.1 on the server, and 4.4.3 on the client. > > I wanted to add that I, too, have one NFS client and server > (running linux-4.11.0 on both the server and the client) > currently in the same kind of state: > > I can reproduce in 100% of the cases that the following commands: > >> rm -f x.sqlite >> sqlite3 x.sqlite "PRAGMA case_sensitive_like=1;PRAGMA synchronous=OFF;PRAGMA recursive_triggers=ON;PRAGMA foreign_keys=OFF;PRAGMA locking_mode = NORMAL;PRAGMA journal_mode = TRUNCATE;" > > result in: > >> "Error: disk I/O error" > > on the client - while working fine on the NFS server - with the same kind > of strace output: > >> fcntl(3, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=1073741824, len=1}) = -1 EIO (Input/output error) >> write(2, "Error: disk I/O error\n", 22Error: disk I/O error > > But unlike the original reporter, we use the NFS v3 protocol: >> server:/data on /data type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,mountvers=3,mountport=20048,mountproto=udp,local_lock=none) > > If you want me to try or trace something on the client, > I'm willing to help. Using "soft" is not a good idea. It could be the cause, but it isn't very likely if NFS is otherwise working OK. It might help to run rpcdebug -m nfs -s all; rpcdebug -m nlm -s all ;rpcdebug -m rpc -s all #repeat your test rpcdebug -m nfs -c all; rpcdebug -m nlm -c all ;rpcdebug -m rpc -c all then collect the kernel logs (possibly just run "dmesg") and post all the messages which happened at that time. It might also help to find the port number that lockd is running on rpcinfo -p $SERVER | grep 'tcp.*nlockmgr' (use the 4th column) and tcpdump -s 0 -w /tmp/trace.pcap port 2049 or port $LOCKD_PID & # run test killall tcpdump gzip /tmp/trace.pcap and put it somewhere it can be fetched from - or maybe post as an attachment if it isn't too big. NeilBrown
Attachment:
signature.asc
Description: PGP signature