Re: [fuse-devel] fuse_lowlevel_notify_inval_inode deadlock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 19 Aug 2011, Miklos Szeredi wrote:
> Sage Weil <sage@xxxxxxxxxxxx> writes:
> 
> > Hi,
> >
> > I just tried using fuse_lowlevel_notify_inval_inode for the ceph fuse 
> > client and ran into a deadlock.  This translates into a call to 
> > invalidate_inode_pages2(), which will lock each page in the address_space.  
> > I end up with a process stuck on
> >
> > [<ffffffff81106a9e>] sleep_on_page+0xe/0x20
> > [<ffffffff81106a87>] __lock_page+0x67/0x70
> > [<ffffffff81114023>] invalidate_inode_pages2_range+0x373/0x390
> > [<ffffffff81260815>] fuse_reverse_inval_inode+0x75/0x90
> > [<ffffffff812589c3>] fuse_dev_do_write+0x8d3/0xae0
> > [<ffffffff81258c3c>] fuse_dev_write+0x6c/0x70
> > [<ffffffff8115d563>] do_sync_readv_writev+0xd3/0x110
> > [<ffffffff8115e3c4>] do_readv_writev+0xd4/0x1e0
> > [<ffffffff8115e518>] vfs_writev+0x48/0x60
> > [<ffffffff8115e651>] sys_writev+0x51/0xc0
> > [<ffffffff815cae02>] system_call_fastpath+0x16/0x1b
> >
> > I assume this is due to a racing write(2) or something.  Has anyone else 
> > seen this?
> 
> Fuse's write function locks the pages being written to.  So yes, doing a
> fuse_lowlevel_notify_inval_inode() on the same file from the write call
> will reliably deadlock.
> 
> > Would invalidate_mapping_pages() make more sense here?  Locked pages (due 
> > to writers) would be skipped, but that seems sane enough to me for a 
> > concurrent write(2) and invalidate callback. 
> 
> What exactly is the purpuse of invalidating the page cache in write?

I took a closer look at my logs and it looks this is what's happening:

 - cfuse: we get a server callback message, take a mutex
 - kernel/fuse: a write starts, locks pages
 - cfuse: we call fuse_lowlevel_notify_inval_inode()
 - cfuse: the write call (or something that preceeds it in the queue) 
   blocks on the mutex
 -> deadlock.. neither the write nor invalidate can complete.

So basically I can't hold any locks during the invalidate call, so that I 
can be sure that the write will complete and we don't deadlock.  That's a 
little inconvenient: I can't use the lock to order the invalidation with 
respect to any other operations (say, a subsequent read(2) that shouldn't 
see stale data) because I have no idea whether a write(2) may have been 
started on the kernel side and may be working it's way through the fuse 
channel.

On the other hand, doing invalidate_mapping_pages() means I may leave 
partially stale data in the page cache that is still marked Uptodate if 
a racing write only overwrites part of a page.

Anyway, clearly fuse is doing the right thing here.  I just need to push 
this to another thread to do it properly from my end.  That complicates 
things a bit, but it's doable.

Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux