On Mon, Nov 04, 2013 at 07:51:46PM -0500, Theodore Ts'o wrote: > The an application in question wants to treat a large file as if it > were a block device --- that's hardly unprecedented; enterprise > databases tend to prefer using raw block devices (at least for > benchmarking purposes), but system administrators like to > administrative convenience of using a file system. Totally reasonable use case. > > The goal here is get the performace as close to a raw block device as > possible. Especially if you are using fast flash, the overhead of > deallocating blocks using punch, only to reallocate the blocks when we > later write into them, is just unnecessary overhead. Also, if you > deallocate the blocks, they could end up getting grabbed by some other > block allocation, which means the file can end up getting very > fragmented --- which doesn't matter that much for flash, I suppose, > but it means the extent tree could end up growing and getting nasty > over time. The bottom line is why bother doing extra work when it's > not necessary? Now we're getting into trouble. I'm all for optimizing for a use case someone cares for. But exposing intimate implementation of that use case is almost always a bad idea. So having a new fallocate to zero out parts of a file and not requiring an allocation to back the file is fine. If it is on a filesystem supporting discards with the discard zeroes blocks flag we can use the implementation from your patch. If the device doesn't support discards or doesn't zero them we'd need to implement it like the XFS_IOC_ZERO_RANGE ioctl. Note that exposing stale blocks is a problem at the block device level, too. If you look at the openstack volume service for example they have to explicitly zero out volumes during volume creation or deletion to make sure no data is exposed to another tenant. The only way to avoid that is to have some auto-zeroing extent state either in software or hardware. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html