Re: Lock ordering in iomap code

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Christoph,

On Mon 17-10-16 12:26:57, Christoph Hellwig wrote:
> > Ping? I have ext4 DAX read & write path working with the iomap code but to
> > convert the fault path, I need this resolved. Are you OK with moving
> > iomap_begin() / iomap_end() calls outside of page lock / entry lock in the
> > fault path?
> 
> Yes, that sounds fine.

I've been looking into this some more and realized it's not as easy as I've
originally though. ->page_mkwrite callback is expected to return with the
page locked so locking it inside the iomap actor is really awkward (think
of situation when blocksize < pagesize). Since nobody currently has issues
with ->iomap_begin being sometimes called with page lock and sometimes
without, I don't think changing that would be worth the hassle. Just one
more question: Doesn't XFS have some lock ordering issues when
xfs_file_iomap_begin() gets called with page lock held from
iomap_page_mkwrite() for a file with extent size hints and thus we end up
in xfs_iomap_write_direct() with page lock held? We do a lot of stuff there
including transaction setup and such...

For DAX I have switched the locking order as there we fully handle the
fault and unlock radix tree entry inside the ->fault / ->page_mkwrite
handler so the locking is sane and also ext4 needs this to be able to use
the iomap infrastructure. The locking difference between DAX and !DAX path
is unfortunate but looks as the least evil to me.

> > I was also thinking about the implications of iomap_begin() (and thus block
> > allocation for buffered writes in nodelalloc case) being no longer protected
> > by page lock and at least for ext2 / ext3 compatibility modes this will
> > lead to uninitialized data exposure when page fault races in the right way
> > with buffered write. So current locking scheme in iomap code is not easily
> > usable for ext4 for buffered writes.
> 
> Right, the iomap I/O code assumes that you either use delalloc, or that
> your have unwritten extents that your convert to written once I/O has
> finished.  XFS uses the latter feature for direct I/O, or if an extent
> size hints is set.
> 
> I can't really think of a good way to handle file systems without
> either feature - the problem is that we'd have to introduce a file-wide
> (or range based) lock to protect allocated but not written ranges.

Yup, I had something similar in mind but definitely don't want to go into
that rathole now ;).

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux