On Wed 02-09-09 11:14:26, Nick Piggin wrote: > On Tue, Sep 01, 2009 at 08:29:29PM +0200, Jan Kara wrote: > > Hi Nick, > > > > On Fri 10-07-09 17:30:33, npiggin@xxxxxxx wrote: > > > I have some questions, marked with XXX. > > > > > > Cc: linux-ext4@xxxxxxxxxxxxxxx > > > Signed-off-by: Nick Piggin <npiggin@xxxxxxx> > > > --- > > > fs/ext2/ext2.h | 1 > > > fs/ext2/file.c | 2 > > > fs/ext2/inode.c | 138 +++++++++++++++++++++++++++++++++++++++++++++----------- > > > 3 files changed, 113 insertions(+), 28 deletions(-) > > > > > > Index: linux-2.6/fs/ext2/inode.c > > > =================================================================== > > > --- linux-2.6.orig/fs/ext2/inode.c > > > +++ linux-2.6/fs/ext2/inode.c > > ... > > > +static void ext2_truncate_blocks(struct inode *inode, loff_t offset) > > > +{ > > > + /* > > > + * XXX: it seems like a bug here that we don't allow > > > + * IS_APPEND inode to have blocks-past-i_size trimmed off. > > > + * review and fix this. > > > + */ > > > + if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) || > > > + S_ISLNK(inode->i_mode))) > > > + return -EINVAL; > > > + if (ext2_inode_is_fast_symlink(inode)) > > > + return -EINVAL; > > > + if (IS_APPEND(inode) || IS_IMMUTABLE(inode)) > > > + return -EPERM; > > > + __ext2_truncate_blocks(inode, offset); > > > +} > > Actually, looking again into this, I think you've introduced a subtle bug > > into the code. When a write fails for some reason, we did vmtruncate() > > previously which called block_truncate_page() which zeroed a tail of the > > last block. Now, ext2_truncate_blocks() does not do this and I think it > > could be a problem because at least in direct IO case, we could have written > > some data into that block on disk. > > We really rely on the tail of the block being zeroed because otherwise > > extending truncate will show those old data in the block. I plan to change > > that in my mkwrite fixes but until then, we should preserve the old > > assumptions. > > So I think that ext2_truncate_blocks() should do all that tail page magic > > as well (although it's not needed in all cases). > > Hi Jan, > > Yeah I did think about this and yes we usually do need to zero out > the page but for these error cases with normal writes we shouldn't > write anything in there. For direct IO... I don't see the problem > because we're not coherent with pagecache anyway... We are not coherent but that's irrelevant - if a failed direct write is followed by an extending truncate and read, it will read the block from disk and could see non-zeros where there should be zeros... > Hmm, but possiby it is a good idea just to keep the same block_truncate_page > calls for this patchset and we can look at it again with your truncate > patches. I'll work on fixing these up. Yes, I think that keeping this change for a separate patch is definitely better. Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html