On Mon, Jan 06, 2025 at 08:27:49AM -0800, Christoph Hellwig wrote: > On Mon, Jan 06, 2025 at 11:17:32AM -0500, Theodore Ts'o wrote: > > Yes. And we might decide that it should be done using some kind of > > ioctl, such as BLKDISCARD, as opposed to a new fallocate operation, > > since it really isn't a filesystem metadata operation, just as > > BLKDISARD isn't. The other side of the argument is that ioctls are > > ugly, and maybe all new such operations should be plumbed through via > > fallocate as opposed to adding a new ioctl. I don't have strong > > feelings on this, although I *do* belive that whatever interface we > > use, whether it be fallocate or ioctl, it should be supported by block > > devices and files in a file system, to make life easier for those > > databases that want to support running on a raw block device (for > > full-page advertisements on the back cover of the Businessweek > > magazine) or on files (which is how 99.9% of all real-world users > > actually run enterprise databases. :-) > > If you want the operation to work for files it needs to be routed > through the file system as otherwise you can't make it actually > work coherently. While you could add a new ioctl that works on a > file fallocate seems like a much better interface. Supporting it > on a block device is trivial, as it can mostly (or even entirely > depending on the exact definition of the interface) reuse the existing > zero range / punch hole code. I think we should wire it up as a new FALLOC_FL_WRITE_ZEROES mode, document very vigorously that it exists to facilitate pure overwrites (specifically that it returns EOPNOTSUPP for always-cow files), and not add more ioctls. (That said, doesn't BLKZEROOUT already do this for bdevs?) --D