Re: PROBLEM: nfs I/O errors with sqlite applications

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2015-10-12 15:46 -0400, J. Bruce Fields wrote:
> On Mon, Oct 12, 2015 at 03:25:38PM -0400, bfields wrote:
> > On Mon, Oct 12, 2015 at 12:48:56PM -0400, Nick Bowler wrote:
> > > I'm having a problem where, eventually, the nfs-mounted home directory
> > > on one of my machines starts failing in a kind of weird way.  The issue
> > > appears to affect only sqlite; I have two applications that I know of
> > > which use it:
> > > 
> > >   - Firefox, where the symptom is that the browser just hangs randomly,
> > >   - gmpc, which crashes immediately on startup with I/O error.
> > > 
> > > Once the issue occurs these applications remain permanently broken.
> > > Since the latter is easier to test, I can run it in strace, and the
> > > failing syscall seems to be:
> > > 
> > >   fcntl(7, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=1073741824, len=1}) = -1 EIO (Input/output error)
> > > 
> > > When the issue occurs, the client dmesg log is full of messages of the form:
> > > 
> > >   [3441972.381211] NFS: v4 server returned a bad sequence-id error on an unconfirmed sequence ffff88007612ae20!
> > > 
> > > There are no unusual messages on the server.
[...]
> > I wonder if there's some way to make this reproduce more quickly, for
> > example by running something that makes more aggressive use of sqlite,
> > or running multiple copies of such a thing simultaneously.  Might be
> > interesting to know what the pattern of file opens and locking looks
> > like (so stracing one of those applications might help).

I could try doing something like using the sqlite3 command-line tool to
do a lot of database operations, and hope I can reproduce.  I'd have to
reboot to test though.

I attached a full strace log (gzipped) from a failing process.  The
command run is:

  sqlite3 newfile.sqlite vacuum

which fails in a similar manner to gmpc.

> Oh, also I forgot to ask what version of the NFS protocol you're using
> (4.0, 4.1, or 4.2).

Looks like 4.0:

  athena:/home on /home type nfs4 (rw,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=krb5,clientaddr=192.168.0.207,local_lock=none,addr=192.168.0.10)

Cheers,
  Nick

Attachment: sqlite3-vacuum-strace.log.gz
Description: Binary data


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux