RE: copy on write for splice() from file to pipe?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Linus Torvalds
> Sent: 10 February 2023 17:24
...
> And when it comes to networking, in general things like TCP checksums
> etc should be ok even with data that isn't stable.  When doing things
> by hand, networking should always use the "copy-and-checksum"
> functions that do the checksum while copying (so even if the source
> data changes, the checksum is going to be the checksum for the data
> that was copied).
> 
> And in many (most?) smarter network cards, the card itself does the
> checksum, again on the data as it is transferred from memory.
> 
> So it's not like "networking needs a stable source" is some really
> _fundamental_ requirement for things like that to work.

It is also worth remembering that TCP needs to be able
to retransmit the data and a much later time.
So the application must not change the data until it has
been acked by the remote system.

Operating systems that do asynchronous IO directly from
application buffers have callbacks/events to tell the
application when it is allowed to modify the buffers.
For TCP this won't be indicated until after the ACK
is received.
I don't think io_uring has any way to indicate anything
other than 'the data has been accepted by the socket'.

If you have 'kernel pages containing data' (eg from writes
into a pipe, or data received from a network) then they have
a single 'owner' and can be passed about.
But user-pages (including mmapped files) have multiple owners
so you are never going to be able to pass them as 'immutable
data'.
If you mmap a very large (and maybe sparse) file and then
try to do a very large (multi-GB) send() (with or without
any kind of page loaning) there is always the possibility
that the data that is actually sent was written while the
send() call was in progress.
Any kind of asynchronous send() just makes it more obvious.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux