On Sat, Jun 05, 2010 at 11:05:23AM -0400, tytso@xxxxxxx wrote: > On Mon, May 24, 2010 at 11:20:34AM +0200, Jan Kara wrote: > > Yes, exactly. I just wanted to point out that AFAICS ext4 can implement > > proper error recovery without a need for 'punch' operation. So after all > > Nick's copy page-by-page should be plausible at least for ext4. > > Sorry for my late response to this thread; I've been busy catching up > on another of other fronts, so I didn't have a chance to go through > this thread until now. > > First of all, I'm not against implementing a 'punch' operation for > ext4; I've actually toyed with this idea before. > > Secondly, I'm not sure it's really necessary; we already have a code > path (which I was planning on making be the default when I have a > chance to rewrite ext4_writepages) where the blocks are initially > allocated with the 'uninitialized' flag in the extent tree; this is > the same flag used for fallocate(2) support when we allocate blocks > without filling in the data blocks. Then, when the block I/O > completes, we use the block I/O callback to clear the uninit flag in > the extent tree. This is currently used to avoid safely avoid locking > in the read path, which is needed to speed up access for extremely > fast (think Fusion I/O-like) flash devices. > > I was already thinking about using this trick in my planned > ext4_writepages() rewrite, and if it turns out we have common code > that also assumes that file systems can do the equivalent fallocate(2) > and can clear the uninitialized bit on a callback, I think that makes > ext4 fairly similar to what XFS does, at least at the high level, > doesn't it? > > Note that strictly speaking this isn't a 'punch' operation in this > case; it's rather an fallocate(2) and don't convert the extent to mark > the data blocks as valid on error, which is not quite the same as a > 'punch' operation. > > Am I missing something? No this is fine, it's actually better than a punch operation from error recovery point of view because it wouldn't require further modifications to to filesystem in the error case. AFAIKS this 'uninitialised blocks' approach seems to be the most optimal way to do block allocations that are not tightly coupled with the pagecache. Do you mean the ext4's file_write path? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html