On 2025/1/7 1:31, Darrick J. Wong wrote: > On Mon, Jan 06, 2025 at 08:27:49AM -0800, Christoph Hellwig wrote: >> On Mon, Jan 06, 2025 at 11:17:32AM -0500, Theodore Ts'o wrote: >>> Yes. And we might decide that it should be done using some kind of >>> ioctl, such as BLKDISCARD, as opposed to a new fallocate operation, >>> since it really isn't a filesystem metadata operation, just as >>> BLKDISARD isn't. The other side of the argument is that ioctls are >>> ugly, and maybe all new such operations should be plumbed through via >>> fallocate as opposed to adding a new ioctl. I don't have strong >>> feelings on this, although I *do* belive that whatever interface we >>> use, whether it be fallocate or ioctl, it should be supported by block >>> devices and files in a file system, to make life easier for those >>> databases that want to support running on a raw block device (for >>> full-page advertisements on the back cover of the Businessweek >>> magazine) or on files (which is how 99.9% of all real-world users >>> actually run enterprise databases. :-) >> >> If you want the operation to work for files it needs to be routed >> through the file system as otherwise you can't make it actually >> work coherently. While you could add a new ioctl that works on a >> file fallocate seems like a much better interface. Supporting it >> on a block device is trivial, as it can mostly (or even entirely >> depending on the exact definition of the interface) reuse the existing >> zero range / punch hole code. > > I think we should wire it up as a new FALLOC_FL_WRITE_ZEROES mode, > document very vigorously that it exists to facilitate pure overwrites > (specifically that it returns EOPNOTSUPP for always-cow files), and not > add more ioctls. > Sorry. the "pure overwrites" and "always-cow files" makes me confused, this is mainly used to create a new written file range, but also could be used to zero out an existing range, why you mentioned it exists to facilitate pure overwrites? For the "always-cow files", do you mean reflinked files? Could you please give more details? Thanks, Yi. > (That said, doesn't BLKZEROOUT already do this for bdevs?) > > --D