On Tue, 2025-03-18 at 23:37 +0100, Lionel Cons wrote: > On Tue, 18 Mar 2025 at 22:17, Trond Myklebust > <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > On Tue, 2025-03-18 at 14:03 -0700, Rick Macklem wrote: > > > > > > The problem I see is that WRITE_SAME isn't defined in a way where > > > the > > > NFSv4 server can only implement zero'ng and fail the rest. > > > As such. I am thinking that a new operation for NFSv4.2 that does > > > writing > > > of zeros might be preferable to trying to (mis)use WROTE_SAME. > > > > Why wouldn't you just implement DEALLOCATE? > > > > Oh my god. > > NFSv4.2 DEALLOCATE creates a hole in a sparse file, and does NOT > write zeros. > > "holes" in sparse files (as created by NFSV4.2 DEALLOCATE) represent > areas of "no data here". For backwards compatibility these holes do > not produce read errors, they just read as 0x00 bytes. But they > represent ranges where just no data are stored. > Valid data (from allocated data ranges) can be 0x00, but there are > NOT > holes, they represent VALID DATA. > > This is an important difference! > For example if we have files, one per week, 700TB file size (100TB > per > day). Each of those files start as a completely unallocated space > (one > big hole). The data ranges are gradually allocated by writes, and the > position of the writes in the files represent the time when they were > collected. If no data were collected during that time that space > remains unallocated (hole), so we can see whether someone collected > data in that timeframe. > > Do you understand the difference? > Yes. I do understand the difference, but in this case you're literally just talking about accounting. The sparse file created by DEALLOCATE does not need to allocate the blocks (except possibly at the edges). If you need to ensure that those empty blocks are allocated and accounted for, then a follow up call to ALLOCATE will do that for you. $ touch foo $ stat foo File: foo Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: 8,17 Inode: 410924125 Links: 1 Access: (0644/-rw-r--r--) Uid: (0/ root) Gid: (0/ root) Context: unconfined_u:object_r:user_home_t:s0 Access: 2025-03-18 19:26:24.113181341 -0400 Modify: 2025-03-18 19:26:24.113181341 -0400 Change: 2025-03-18 19:26:24.113181341 -0400 Birth: 2025-03-18 19:25:12.988344235 -0400 $ truncate -s 1GiB foo $ stat foo File: foo Size: 1073741824 Blocks: 0 IO Block: 4096 regular file Device: 8,17 Inode: 410924125 Links: 1 Access: (0644/-rw-r--r--) Uid: (0/ root) Gid: (0/ root) Context: unconfined_u:object_r:user_home_t:s0 Access: 2025-03-18 19:26:24.113181341 -0400 Modify: 2025-03-18 19:27:35.161694301 -0400 Change: 2025-03-18 19:27:35.161694301 -0400 Birth: 2025-03-18 19:25:12.988344235 -0400 $ fallocate -z -l 1GiB foo $ stat foo File: foo Size: 1073741824 Blocks: 2097152 IO Block: 4096 regular file Device: 8,17 Inode: 410924125 Links: 1 Access: (0644/-rw-r--r--) Uid: (0/ root) Gid: (0/ root) Context: unconfined_u:object_r:user_home_t:s0 Access: 2025-03-18 19:26:24.113181341 -0400 Modify: 2025-03-18 19:27:54.462817356 -0400 Change: 2025-03-18 19:27:54.462817356 -0400 Birth: 2025-03-18 19:25:12.988344235 -0400 Yes, I also realise that none of the above operations actually resulted in blocks being physically filled with data, but all modern flash based drives tend to have a log structured FTL. So while overwriting data in the HDD era meant that you would usually (unless you had a log based filesystem) overwrite the same physical space with data, today's drives are free to shift the rewritten block to any new physical location in order to ensure even wear levelling of the SSD. IOW: there is no real advantage to physically writing out the data unless you have a peculiar interest in wasting time. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx