On Sun 09-09-12 21:27:11, Dmitry Monakhov wrote: > fsync and punch_hole are the places where we have to wait for all > existing writers (writeback, aio, dio), but currently we simply > flush pended end_io request which is not sufficient. Why not? I guess you mean the fact that there can be DIO in flight for which end_io() was not called so it is not queued in the queue? But that is OK - we have not yet called aio_complete() for that IO so for userspace the write has not happened yet. Thus there's no need to flush it to disk - fsync() does not say anything about writes in progress while fsync is called. > Even more i_mutex is not holded while punch_hole which obviously > result in dangerous data corruption due to write-after-free. Yes, that's a bug. I also noticed that but didn't get to fixing it (I'm actually working on a more long term fix using range locking but that's more of a research project so having somehow fixed at least the most blatant locking problems is good). Honza > > This patch performs following changes: > > - Guard punch_hole with i_mutex > - fsync and punch_hole now wait for all writers in flight > NOTE: XXX write-after-free race is still possible because > truncate_pagecache_range() is not completely reliable and where > is no easy way to stop writeback while punch_hole is in progress. > > Signed-off-by: Dmitry Monakhov <dmonakhov@xxxxxxxxxx> > --- > fs/ext4/extents.c | 10 ++++++++-- > fs/ext4/fsync.c | 1 + > 2 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c > index e993879..8252651 100644 > --- a/fs/ext4/extents.c > +++ b/fs/ext4/extents.c > @@ -4845,6 +4845,7 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length) > return err; > } > > + mutex_lock(&inode->i_mutex); > /* Now release the pages */ > if (last_page_offset > first_page_offset) { > truncate_pagecache_range(inode, first_page_offset, > @@ -4852,12 +4853,15 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length) > } > > /* finish any pending end_io work */ > + inode_dio_wait(inode); > ext4_flush_completed_IO(inode); > > credits = ext4_writepage_trans_blocks(inode); > handle = ext4_journal_start(inode, credits); > - if (IS_ERR(handle)) > - return PTR_ERR(handle); > + if (IS_ERR(handle)) { > + err = PTR_ERR(handle); > + goto out_mutex; > + } > > err = ext4_orphan_add(handle, inode); > if (err) > @@ -4951,6 +4955,8 @@ out: > inode->i_mtime = inode->i_ctime = ext4_current_time(inode); > ext4_mark_inode_dirty(handle, inode); > ext4_journal_stop(handle); > +out_mutex: > + mutex_unlock(&inode->i_mutex); > return err; > } > int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, > diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c > index 24f3719..290c5cf 100644 > --- a/fs/ext4/fsync.c > +++ b/fs/ext4/fsync.c > @@ -204,6 +204,7 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync) > if (inode->i_sb->s_flags & MS_RDONLY) > goto out; > > + inode_dio_wait(inode); > ret = ext4_flush_completed_IO(inode); > if (ret < 0) > goto out; > -- > 1.7.7.6 > -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html