fuse_lowlevel_notify_inval_inode deadlock

Sage Weil <sage@xxxxxxxxxxx> · Wed, 17 Apr 2013 17:43:18 -0700 (PDT)

We've hit a new deadlock with fuse_lowlevel_notify_inval_inode, this time 
on the read side:

- ceph-fuse queues an invalidate (in a separate thread)
- kernel initiates a read
- invalidate blocks in kernel, waiting on a page lock
- the read blocks in ceph-fuse

Now, assuming we're reading the stack traces properly, this is more or 
less what we see with writes, except with reads, and the obvious "don't 
block the read" would resolve it.

But!  If that is the only way to avoid deadlock, I'm afraid it is 
difficult to implement reliable cache invalidation at all.  The reason we 
are invalidating is because the server told us to: we are no longer 
allowed to do reads and cached data is invalid.  The obvious approach is 
to 

1- stop processing new reads
2- let in-progress reads complete
3- invalidate the cache
4- ack to server

...but that will deadlock as above, as any new read will lock pages before 
blcoking.  If we don't block, then the read may repopulate pages we just 
invalidated.  We could

1- invalidate
2- if any reads happened while we were invalidating, goto 1
3- ack

but then we risk starvation and livelock.

How do other people solve this problem?  It seems like another upcall that 
would let you block new reads (and/or writes) from starting while the 
invalidate is in progress would do the trick, but I'm not convinced I'm 
not missing something much simpler.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html