Re: [PATCH 12/12] dax: New fault locking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri 18-03-16 15:16:18, Jan Kara wrote:
> On Wed 16-03-16 08:34:28, NeilBrown wrote:
> > On Fri, Mar 11 2016, NeilBrown wrote:
> > 
> > > On Fri, Mar 11 2016, Jan Kara wrote:
> > >
> > >> Currently DAX page fault locking is racy.
> > >>
> > >> CPU0 (write fault)		CPU1 (read fault)
> > >>
> > >> __dax_fault()			__dax_fault()
> > >>   get_block(inode, block, &bh, 0) -> not mapped
> > >> 				  get_block(inode, block, &bh, 0)
> > >> 				    -> not mapped
> > >>   if (!buffer_mapped(&bh))
> > >>     if (vmf->flags & FAULT_FLAG_WRITE)
> > >>       get_block(inode, block, &bh, 1) -> allocates blocks
> > >>   if (page) -> no
> > >> 				  if (!buffer_mapped(&bh))
> > >> 				    if (vmf->flags & FAULT_FLAG_WRITE) {
> > >> 				    } else {
> > >> 				      dax_load_hole();
> > >> 				    }
> > >>   dax_insert_mapping()
> > >>
> > >> And we are in a situation where we fail in dax_radix_entry() with -EIO.
> > >>
> > >> Another problem with the current DAX page fault locking is that there is
> > >> no race-free way to clear dirty tag in the radix tree. We can always
> > >> end up with clean radix tree and dirty data in CPU cache.
> > >>
> > >> We fix the first problem by introducing locking of exceptional radix
> > >> tree entries in DAX mappings acting very similarly to page lock and thus
> > >> synchronizing properly faults against the same mapping index. The same
> > >> lock can later be used to avoid races when clearing radix tree dirty
> > >> tag.
> > >
> > > Hi,
> > >  I think the exception locking bits look good - I cannot comment on the
> > >  rest.
> > >  I looks like it was a good idea to bring the locking into dax.c instead
> > >  of trying to make it generic.
> > >
> > 
> > Actually ... I'm still bothered by the exclusive waiting.  If an entry
> > is locked and there are two threads in dax_pfn_mkwrite() then one would
> > be woken up when the entry is unlocked and it will just set the TAG_DIRTY
> > flag and then continue without ever waking the next waiter on the
> > wait queue.
> > 
> > I *think* that any thread which gets an exclusive wakeup is responsible
> > for performing another wakeup.  In this case it must either lock the
> > slot, or call __wakeup.
> > That means:
> >   grab_mapping_entry needs to call wakeup:
> >      if radix_tree_preload() fails
> >      if radix_tree_insert fails other than with -EEXIST
> >      if a valid page was found
> 
> Why would we need to call wake up when a valid page was found? In that case
> there should not be any process waiting for the radix tree entry lock.
> Otherwise I agree with you. Thanks for pointing this out, you've likely
> saved me quite some debugging ;).
> 
> 
> >   dax_delete_mapping_entry needs to call wakeup
> >      if the fail case, though as that isn't expect (WARN_ON_ONCE)
> >         it should be a problem not to wakeup here
> >   dax_pfn_mkwrite needs to call wakeup unconditionally

Actually, after some thought I don't think the wakeup is needed except for
dax_pfn_mkwrite(). In the other cases we know there is no radix tree
exceptional entry and thus there can be no waiters for its lock...

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux