Hi Christoph, On Mon 17-10-16 12:26:57, Christoph Hellwig wrote: > > Ping? I have ext4 DAX read & write path working with the iomap code but to > > convert the fault path, I need this resolved. Are you OK with moving > > iomap_begin() / iomap_end() calls outside of page lock / entry lock in the > > fault path? > > Yes, that sounds fine. I've been looking into this some more and realized it's not as easy as I've originally though. ->page_mkwrite callback is expected to return with the page locked so locking it inside the iomap actor is really awkward (think of situation when blocksize < pagesize). Since nobody currently has issues with ->iomap_begin being sometimes called with page lock and sometimes without, I don't think changing that would be worth the hassle. Just one more question: Doesn't XFS have some lock ordering issues when xfs_file_iomap_begin() gets called with page lock held from iomap_page_mkwrite() for a file with extent size hints and thus we end up in xfs_iomap_write_direct() with page lock held? We do a lot of stuff there including transaction setup and such... For DAX I have switched the locking order as there we fully handle the fault and unlock radix tree entry inside the ->fault / ->page_mkwrite handler so the locking is sane and also ext4 needs this to be able to use the iomap infrastructure. The locking difference between DAX and !DAX path is unfortunate but looks as the least evil to me. > > I was also thinking about the implications of iomap_begin() (and thus block > > allocation for buffered writes in nodelalloc case) being no longer protected > > by page lock and at least for ext2 / ext3 compatibility modes this will > > lead to uninitialized data exposure when page fault races in the right way > > with buffered write. So current locking scheme in iomap code is not easily > > usable for ext4 for buffered writes. > > Right, the iomap I/O code assumes that you either use delalloc, or that > your have unwritten extents that your convert to written once I/O has > finished. XFS uses the latter feature for direct I/O, or if an extent > size hints is set. > > I can't really think of a good way to handle file systems without > either feature - the problem is that we'd have to introduce a file-wide > (or range based) lock to protect allocated but not written ranges. Yup, I had something similar in mind but definitely don't want to go into that rathole now ;). Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html