On Mon, Jan 06, 2025 at 03:27:52AM -0800, Christoph Hellwig wrote: > There's a feature request for something similar on the xfs list, so > I guess people are asking for it. Yeah, I have folks asking for this on the ext4 side as well. The one caution that I've given to them is that there is no guarantee what the performance will be for WRITE SAME or equivalent operations, since the standards documents state that performance is out of scope for the document. So in some cases, WRITE SAME might be fast (if for example it is just adjusing FTL metadata on an SSD, or some similar thing on cloud-emulated block devices such as Google's Persistent Desk or Amazon's Elastic Block Device --- what Darrick has called "software defined storage" for the cloud), but in other hardware deployments, WRITE SAME might be as slow as writing zeros to an HDD. This is technically not the kernel's problem, since we can also use the same mealy-mouth "performance is out of scope and not the kernel's concern", but that just transfers the problem to the application programmers. I could imagine some kind of tunable which we can make the block device pretend that it really doesn't support using WRITE SAME if the performance characteristics are such that it's a Bad Idea to use it, so that there's a single tunable knob that the system adminstrator can reach for as opposed to have different ways for PostgresQL, MySQL, Oracle Enterprise Database, etc have for configuring whether or not to disable WRITE SAME, but that's not something we need to decide right away. > That being said this really should not be a modifier but a separate > operation, as the logic is very different from FALLOC_FL_ZERO_RANGE, > similar to how plain prealloc, hole punch and zero range are different > operations despite all of them resulting in reads of zeroes from the > range. Yes. And we might decide that it should be done using some kind of ioctl, such as BLKDISCARD, as opposed to a new fallocate operation, since it really isn't a filesystem metadata operation, just as BLKDISARD isn't. The other side of the argument is that ioctls are ugly, and maybe all new such operations should be plumbed through via fallocate as opposed to adding a new ioctl. I don't have strong feelings on this, although I *do* belive that whatever interface we use, whether it be fallocate or ioctl, it should be supported by block devices and files in a file system, to make life easier for those databases that want to support running on a raw block device (for full-page advertisements on the back cover of the Businessweek magazine) or on files (which is how 99.9% of all real-world users actually run enterprise databases. :-) - Ted