On Thu, May 20, 2010 at 09:50:54AM +1000, Dave Chinner wrote: > > As I said, we can have a dumb fallback path for filesystems that > > don't implement hole punching. Clear the blocks past i size, and > > zero out the allocated but not initialized blocks. > > > > There does not have to be pagecache allocated in order to do this, > > you could do direct IO from the zero page in order to do it. > > I don't see that as a good solution - it's once again a fairly > complex way of dealing with the problem, especially as it now means > that direct io would fall back to buffered which would fall back to > direct IO.... Well it wouldn't use the full direct IO path. It has the block, just build a bio with the source zero page and write it out. If the fs requires anything more fancy than that, tough, it should just implement hole punching. > > Hole punching is not only useful there, it is already exposed to > > userspace via MADV_REMOVE. > > That interface is *totally broken*. Why? > It has all the same problems as > vmtruncate() for removing file blocks (because it uses vmtruncate). > It also has the fundamental problem of being called un the mmap_sem, > which means that inode locks and therefore de-allocation cannot be > executed without the possibility of deadlocks. None of that is an API problem, it's all implementation. Yes fadivse would be a much better API, but the madvise API is still there. Implementation wise: it does not use vmtruncate; it has no mmap_sem problem. > Fundamentally, hole > punching is an inode operation, not a VM operation.... VM acts as a handle to inode operations. It's no big deal. > > An API that doesn't require that, though, should be less overhead > > and simpler. > > > > Is it really going to be a problem to implement block hole punching > > in ext4 and gfs2? > > I can't follow the ext4 code - it's an intricate maze of weird entry > and exit points, so I'm not even going to attempt to comment on it. > > The gfs2 code is easier to follow and it looks like it would require > a redesign and rewrite of the block truncation implementation as it > appears to assume that blocks are only ever removed from the end of > the file - I don't think the recursive algorithms for trimming the > indirect block trees can be easily modified for punching out > arbitrary ranges of blocks easily. I could be wrong, though, as I'm > not a gfs2 expert.... I'm far more in favour of doing the interfaces right, and making the filesystems fix themselves to use it. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html