Re: inode_dio_wait and hole-punch?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon 14-05-12 18:13:36, Hugh Dickins wrote:
> On Tue, 15 May 2012, Jan Kara wrote:
> > On Sun 13-05-12 14:26:40, Hugh Dickins wrote:
> > > In looking through hole-punching implementations, I got the impression
> > > that perhaps ext4 and ocfs2 might need to inode_dio_wait() before they
> > > truncate pagecache.  And that might require ext4 to hold i_mutex there?
> >   Not necessarily before truncating page cache AFAICS
> 
> Yes, DIO would hold reference to pages, so truncating cache should be safe.
> 
> > but certainly before we go and remove blocks from the file.
> 
> Yes, that seems more like it.
> 
> > In case of ext4 doing that before
> > ext4_flush_completed_IO() call would look most appropriate I'd think. Good
> > spotting! Will you send a patch?
> 
> No, Jan, I'm afraid of doing more harm than good to ext4 here -
> I'd much rather trust you to take that risk with my data!
  Oh, I am flattered. I can corrupt a few more blocks for you in reward ;).

> In particular, it's not clear to me whether ext4 needs to take i_mutex
> there, so inode_dio_wait() semantics are complete.  And it's not clear
> to me precisely where it should take i_mutex if so: maybe it actually
> needs to take i_mutex across the whole function, I do not know.
> 
> Notice (I sent Allison reminder of it yesterday) that ext4_ext_punch_hole()
> is already wrong in its "if (offset >= inode->i_size) return 0" at the
> start, which makes fallocate PUNCH_HOLE unable to remove blocks added
> earlier by fallocate KEEP_SIZE beyond eof.
> 
> But I know nothing of ext4 block allocation rules, so that's something
> also where a "fix" from me would be more likely to do harm than good.
  Hum, indeed the locking around ext4_ext_punch_hole() looks a bit
insufficient. Even i_mutex need not be enough because of races with page
faults and writeback. That filemap_write_and_wait_range() call is just a
feeble attempt to avoid those. i_data_sem we take further down is enough to
stop all these races but currently I'm not clear what part of the code
before we take that lock can be just discarded, what part is correct and
what part is buggy and needs better locking. I'll take a deeper look...

								Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux