Re: [Reproducer] Corruption, possible race between splice and FALLOC_FL_PUNCH_HOLE

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 29 Jun 2023 08:17:38 +1000

On Wed, Jun 28, 2023 at 07:27:26PM +0100, David Howells wrote:
> Matt Whitlock <kernel@xxxxxxxxxxxxxxxxx> wrote:
> 
> > In other words, the currently implemented behavior is appropriate for
> > SPLICE_F_MOVE, but it is not appropriate for ~SPLICE_F_MOVE.
> 
> The problems with SPLICE_F_MOVE is that it's only applicable to splicing *out*
> of a pipe.  By the time you get that far the pages can already be corrupted by
> a shared-writable mmap or write().

That's not documented in the man page.

Indeed, I think Matt's point - and mine, too, for that matter - is
that the splice(2) man page documents *none* of this
"copy-by-reference" behaviour or it's side effects. All the man page
documents is that the data is *copied in kernel-space* rather than
needing kernel->user->kernel data movement to copy it from one fd to
the other.

The man page *heavily implies* that splice is a "fast immediate
data copy". It most definitely does not describe any "zero-copy with
whacky post-completion data stream corrupting side effects"
mechanisms. There's not even an entry in the "notes" or "bugs"
section to warn users that they cannot trust the contents of the
source or destination pipe to be what they think they might be as
the "data copy" implied by the pipe buffer might not occur until
some arbitrary time in the future.

Hence, according to the man page, what it is doing right now
definitely contrary to the behaviour implied by the documentation...

i.e. If the data that is "copied" to the destination pipe is not
resolved until some future action by some 3rd party process is
performed, then the man page must tell users they cannot use this
for any sort of data stream where they require the data being
transferred needs to remain stable as of the time of the splice
operation.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx