On Tue, Jun 27, 2023 at 02:14:41PM -0400, Matt Whitlock wrote: > On Tuesday, 27 June 2023 01:47:57 EDT, Dave Chinner wrote: > > On Mon, Jun 26, 2023 at 09:12:52PM -0400, Matt Whitlock wrote: > > > Hello, all. I am experiencing a data corruption issue on Linux 6.1.24 when > > > calling fallocate with FALLOC_FL_PUNCH_HOLE to punch out pages that have > > > just been spliced into a pipe. It appears that the fallocate call can zero > > > out the pages that are sitting in the pipe buffer, before those pages are > > > read from the pipe. > > > > > > Simplified code excerpt (eliding error checking): > > > > > > int fd = /* open file descriptor referring to some disk file */; > > > for (off_t consumed = 0;;) { > > > ssize_t n = splice(fd, NULL, STDOUT_FILENO, NULL, SIZE_MAX, 0); > > > if (n <= 0) break; > > > consumed += n; > > > fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, 0, consumed); > > > } > > > > Huh. Never seen that pattern before - what are you trying to > > implement with this? > > It's part of a tool I wrote that implements an indefinitely expandable > user-space pipe buffer backed by an unlinked-on-creation disk file. It's > very useful as a shim in a pipeline between a process that produces a large > but finite amount of data quickly and a process that consumes data slowly. > My canonical use case is in my nightly backup cronjob, where I have 'tar -c' > piped into 'xz' piped into a tool that uploads its stdin to an Amazon > S3-compatible data store. Neat trick. I think that what you really want for this is something like blksnap so you can have temporary atomic snapshots of the filesystem to take the backup from :) Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx