On Tue, Mar 18, 2025 at 4:40 PM Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Tue, 2025-03-18 at 23:37 +0100, Lionel Cons wrote: > > On Tue, 18 Mar 2025 at 22:17, Trond Myklebust > > <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > > > On Tue, 2025-03-18 at 14:03 -0700, Rick Macklem wrote: > > > > > > > > The problem I see is that WRITE_SAME isn't defined in a way where > > > > the > > > > NFSv4 server can only implement zero'ng and fail the rest. > > > > As such. I am thinking that a new operation for NFSv4.2 that does > > > > writing > > > > of zeros might be preferable to trying to (mis)use WROTE_SAME. > > > > > > Why wouldn't you just implement DEALLOCATE? > > > > > > > Oh my god. > > > > NFSv4.2 DEALLOCATE creates a hole in a sparse file, and does NOT > > write zeros. > > > > "holes" in sparse files (as created by NFSV4.2 DEALLOCATE) represent > > areas of "no data here". For backwards compatibility these holes do > > not produce read errors, they just read as 0x00 bytes. But they > > represent ranges where just no data are stored. > > Valid data (from allocated data ranges) can be 0x00, but there are > > NOT > > holes, they represent VALID DATA. > > > > This is an important difference! > > For example if we have files, one per week, 700TB file size (100TB > > per > > day). Each of those files start as a completely unallocated space > > (one > > big hole). The data ranges are gradually allocated by writes, and the > > position of the writes in the files represent the time when they were > > collected. If no data were collected during that time that space > > remains unallocated (hole), so we can see whether someone collected > > data in that timeframe. > > > > Do you understand the difference? > > > > Yes. I do understand the difference, but in this case you're literally > just talking about accounting. The sparse file created by DEALLOCATE > does not need to allocate the blocks (except possibly at the edges). If > you need to ensure that those empty blocks are allocated and accounted > for, then a follow up call to ALLOCATE will do that for you. Unfortunately ZFS knows how to deallocate, but not how to allocate. > > $ touch foo > $ stat foo > File: foo > Size: 0 Blocks: 0 IO Block: 4096 regular empty file > Device: 8,17 Inode: 410924125 Links: 1 > Access: (0644/-rw-r--r--) Uid: (0/ root) Gid: (0/ root) > Context: unconfined_u:object_r:user_home_t:s0 > Access: 2025-03-18 19:26:24.113181341 -0400 > Modify: 2025-03-18 19:26:24.113181341 -0400 > Change: 2025-03-18 19:26:24.113181341 -0400 > Birth: 2025-03-18 19:25:12.988344235 -0400 > $ truncate -s 1GiB foo > $ stat foo > File: foo > Size: 1073741824 Blocks: 0 IO Block: 4096 regular file > Device: 8,17 Inode: 410924125 Links: 1 > Access: (0644/-rw-r--r--) Uid: (0/ root) Gid: (0/ root) > Context: unconfined_u:object_r:user_home_t:s0 > Access: 2025-03-18 19:26:24.113181341 -0400 > Modify: 2025-03-18 19:27:35.161694301 -0400 > Change: 2025-03-18 19:27:35.161694301 -0400 > Birth: 2025-03-18 19:25:12.988344235 -0400 > $ fallocate -z -l 1GiB foo > $ stat foo > File: foo > Size: 1073741824 Blocks: 2097152 IO Block: 4096 regular file > Device: 8,17 Inode: 410924125 Links: 1 > Access: (0644/-rw-r--r--) Uid: (0/ root) Gid: (0/ root) > Context: unconfined_u:object_r:user_home_t:s0 > Access: 2025-03-18 19:26:24.113181341 -0400 > Modify: 2025-03-18 19:27:54.462817356 -0400 > Change: 2025-03-18 19:27:54.462817356 -0400 > Birth: 2025-03-18 19:25:12.988344235 -0400 > > > Yes, I also realise that none of the above operations actually resulted > in blocks being physically filled with data, but all modern flash based > drives tend to have a log structured FTL. So while overwriting data in > the HDD era meant that you would usually (unless you had a log based > filesystem) overwrite the same physical space with data, today's drives > are free to shift the rewritten block to any new physical location in > order to ensure even wear levelling of the SSD. Yea. The Wr_zero operation writes 0s to the logical block. Does that guarantee there is no "old block for the logical block" that still holds the data? (It does say Wr_zero can be used for secure erasure, but??) Good question for which I don't have any idea what the answer is, rick > > IOW: there is no real advantage to physically writing out the data > unless you have a peculiar interest in wasting time. > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@xxxxxxxxxxxxxxx > >