Re: Unwritten extent zeroing beyond i_size

Jan Kara <jack@xxxxxxx> · Thu, 14 Mar 2013 11:56:29 +0100



On Thu 14-03-13 08:56:55, Lukáš Czerner wrote:
> On Wed, 13 Mar 2013, Jan Kara wrote:
> 
> > Date: Wed, 13 Mar 2013 10:56:40 +0100
> > From: Jan Kara <jack@xxxxxxx>
> > To: Dmitry Monakhov <dmonakhov@xxxxxxxxxx>
> > Cc: linux-ext4@xxxxxxxxxxxxxxx, Ted Tso <tytso@xxxxxxx>
> > Subject: Unwritten extent zeroing beyond i_size
> > 
> >   Hello Dmitry,
> > 
> >   I'm tracking down failure in xfstests test 274 (fallocate + ENOSPC
> > testing). The problem I found (and that's really unrelated to the question
> > I want to ask) is that if write beyond i_size fails, we truncate the file
> > to i_size to remove any blocks that may have been allocated under the page
> > by the write before it failed (think of blocksize < pagesize config).
> > 
> > Now in this test the write fails because it needs to split unwritten extent
> > and there's no space for that and zeroing out is impossible because we are
> > beyond i_size. And here comes my question: You disallowed zeroing of
> > extents beyond i_size because fsck complains about those. Won't it be
> > better to just add inode flag saying "this inode has blocks preallocated
> > beyond i_size" and make fsck not complain about such blocks? IMHO that
> > would catch 99% of corruptions as well and would let us solve the problem
> > with ENOSPC on writes to preallocated space (plus it would simplify the
> > kernel code).
> > 
> > 								Honza
> 
> Unfortunately this will not solve the real issue that writing into
> preallocated space should _not_ fail at all, because it is
> preallocated.
> 
> The problem right now is that we simply do not have block to
> allocate metadata, and there is no way for us to reserve metadata
> blocks in advance as we might try to do in delalloc.
  But if you don't need to split the extent (you will change the whole
extent from unwritten to written state) you don't need any aditional
metadata blocks. So write cannot fail...

> I've proposed the solution for this in the recent email with subject
> "Metadata reservation for unwritten extent conversion". Basically
> the idea is to have reserved pool of blocks which could be used for
> exactly this (and other) cases. Note that xfs actually have the same
> thing for exactly the same reasons.
  Yeah, I've read your proposal. I don't really object to this solution as
it has advantages over "don't split extent if we are out of space" - namely
it's going to be faster than writing extent full of zeros. But OTOH writing
zeros is so much simpler than implementing some reservation of blocks for
emergency cases that it looks as a compelling solution to me.

								Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html