On Thu 05-03-15 09:00:05, Dave Chinner wrote: > On Wed, Mar 04, 2015 at 05:18:48PM +0100, Jan Kara wrote: > > On Wed 04-03-15 10:30:24, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > > > Add the initial support for DAX file operations to XFS. This > > > includes the necessary block allocation and mmap page fault hooks > > > for DAX to function. > > > > > > Note that the current block allocation code abuses the mapping > > > buffer head to provide a completion callback for unwritten extent > > > allocation when DAX is clearing blocks. The DAX interface needs to > > > be changed to provide a callback similar to get_blocks for this > > > callback. > > > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > ..... > > > +static int > > > +xfs_filemap_dax_page_mkwrite( > > > + struct vm_area_struct *vma, > > > + struct vm_fault *vmf) > > > +{ > > > + struct xfs_inode *ip = XFS_I(vma->vm_file->f_mapping->host); > > > + int error; > > > + > > > + trace_xfs_filemap_page_mkwrite(ip); > > > + > > > + xfs_ilock(ip, XFS_MMAPLOCK_SHARED); > > So I think the lock ordering of XFS_MMAPLOCK and freezing protection is > > suspicious (and actually so is for normal write faults as I'm looking - > > didn't realize that when I was first reading your MMAPLOCK patches). > > Because you take XFS_MMAPLOCK outside of freeze protection however usually > > we want freeze protection to be the outermost lock - in particular in > > xfs_file_fallocate() you take XFS_MMAPLOCK inside freeze protection I > > think. > > OK, so why isn't lockdep triggering on that? lockdep is aware of > inode locks and the freeze states, supposedly to pick up these exact > issues... > > Oh, probably because the sb freeze order is write, pagefault, > transaction. > > i.e. In the fallocate case, we do sb_start_write, MMAP_LOCK. If we are in > a freeze case, we aren't going to freeze page faults until we've > frozen all the writes have drained, so there isn't a lock order > dependency there. Same for any other mnt_want_write/sb-start_write > based modification. > > Hence the fallocate path and anything that runs through setattr will > complete and release the mmap lock and then be prevented from taking > it again by the time sb_start_pagefault() can block with the mmap > lock held. So there isn't actually a deadlock there because of the > way freeze works, and that's why lockdep is staying silent. Yeah, you're right there isn't a deadlock possibility. After all the lock ranking of your MMAP_LOCk is currently the same as of mmap_sem (and the difficult lock ordering of that semaphore has been the reason why we have special type of freeze protection for page faults). > Still, I probably need to fix it so I'm not leaving a potential > landmine around. I would find it easier to grasp. Yes. > > So you'll need to do what ext4 needs to do - take freeze protection, take > > fs specific locks, and then call do_dax_fault(). Matthew has a patch to > > actually export do_dax_fault (as __dax_fault()) for filesystems. > > pointer to it? if none, I'll just write my own.... http://permalink.gmane.org/gmane.comp.file-systems.ext4/47866 Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html