On 2015-10-12 15:46 -0400, J. Bruce Fields wrote: > On Mon, Oct 12, 2015 at 03:25:38PM -0400, bfields wrote: > > On Mon, Oct 12, 2015 at 12:48:56PM -0400, Nick Bowler wrote: > > > I'm having a problem where, eventually, the nfs-mounted home directory > > > on one of my machines starts failing in a kind of weird way. The issue > > > appears to affect only sqlite; I have two applications that I know of > > > which use it: > > > > > > - Firefox, where the symptom is that the browser just hangs randomly, > > > - gmpc, which crashes immediately on startup with I/O error. > > > > > > Once the issue occurs these applications remain permanently broken. > > > Since the latter is easier to test, I can run it in strace, and the > > > failing syscall seems to be: > > > > > > fcntl(7, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=1073741824, len=1}) = -1 EIO (Input/output error) > > > > > > When the issue occurs, the client dmesg log is full of messages of the form: > > > > > > [3441972.381211] NFS: v4 server returned a bad sequence-id error on an unconfirmed sequence ffff88007612ae20! > > > > > > There are no unusual messages on the server. [...] > > I wonder if there's some way to make this reproduce more quickly, for > > example by running something that makes more aggressive use of sqlite, > > or running multiple copies of such a thing simultaneously. Might be > > interesting to know what the pattern of file opens and locking looks > > like (so stracing one of those applications might help). I could try doing something like using the sqlite3 command-line tool to do a lot of database operations, and hope I can reproduce. I'd have to reboot to test though. I attached a full strace log (gzipped) from a failing process. The command run is: sqlite3 newfile.sqlite vacuum which fails in a similar manner to gmpc. > Oh, also I forgot to ask what version of the NFS protocol you're using > (4.0, 4.1, or 4.2). Looks like 4.0: athena:/home on /home type nfs4 (rw,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=krb5,clientaddr=192.168.0.207,local_lock=none,addr=192.168.0.10) Cheers, Nick
Attachment:
sqlite3-vacuum-strace.log.gz
Description: Binary data