Re: [Reproducer] Corruption, possible race between splice and FALLOC_FL_PUNCH_HOLE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:

> On Wed, Jun 28, 2023 at 10:33:24AM +0100, David Howells wrote:
> > Would you consider it a kernel bug, then, if you use sendmsg(MSG_ZEROCOPY) to
> > send some data from a file mmapping that some other userspace then corrupts by
> > altering the file before the kernel has managed to send it?
> 
> I think there's a difference in that sendmsg() will block until it
> returns (... right?

Nope - sendmsg returns once the data is in the Tx queue.  It tells you about
the progress made in transmitting your zerocopy data by pasting error-report
messages with SO_EE_ORIGIN_ZEROCOPY set into the msg_control buffer when you
call recvmsg().  That tells you how many of the bytes you sent with
MSG_ZEROCOPY have so far been transmitted.  You're expected to work out from
that which of your buffers are now freed up.

Further, even if the process that issued the sendmsg() might be blocked, it
doesn't mean that some other process can modify the contents of the buffer by
write() or writing through a shared-writable mmap.

> splice() returns.  That implies the copy is done.

I thought the whole point of splice was to *avoid* the copy.  Or is it only to
avoid user<->kernel copies?

> Then the same thread modifies the file, but the pipe sees the new data, not
> the old.  Doesn't that feel like a bug to you?

It can be argued either way.  It you want to see it as a bug, then what
solution do you want?

 (1) Always copy the data into the pipe.

 (2) Always unmap and steal the pages from the pagecache, copying if we can't.

 (3) R/O-protect any PTEs mapping those pages and implement CoW.

 (4) Disallow splice() from any region that's mmapped, disallow mmap() on or
     make page_mkwrite wait for any region that's currently spliced.  Disallow
     fallocate() on or make fallocate() wait for any pages that are spliced.

2 and 3 are likely to have a performance degradation because we'll have to
zap/modify PTEs before completing the splice.  4 will have some functionality
degradation.

3 sounds particularly messy.  The problem is that if a page in the pagecache
is spliced into a pipe, we have to discard the page from the pagecache if we
would otherwise want to modify it.  I guess we'd need a splice counter adding
to struct folio?  (Or as least a PG_was_once_spliced bit).

Maybe (2a) Steal the page if unmapped, unpinned and only in use by the
pagecache, copy it otherwise?

Maybe you have a better solution?

David





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux