On Thu, Feb 9, 2023 at 11:36 AM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > I guarantee that you will only slow things down with some odd async_memcpy. Extended note: even if the copies themselves would then be done concurrently with other work (so "not faster, but more parallel"), the synchronization required at the end would then end up being costly enough to eat up any possible win. Plus you'd still end up with a fundamental problem of "what if the data changes in the meantime". And that's ignoring all the practical problems of actually starting the async copy, which traditionally requires virtual to physical translations (where "physical" is whatever the DMA address space is). So I don't think there are any actual real cases of async memory copy engines being even _remotely_ better than memcpy outside of microcontrollers (and historical computers before caches - people may still remember things like the Amiga blitter fondly). Again, the exception ends up being if you can actually use real DMA to not do a memory copy, but to transfer data directly to or from the device. That's in some way what 'splice()' is designed to allow you to do, but exactly by the pipe part ending up being the "conceptual buffer" for the zero-copy pages. So this is exactly *why* splicing from a file all the way to the network will then show any file changes that have happened in between that "splice started" and "network card got the data". You're supposed to use splice only when you can guarantee the data stability (or, alternatively, when you simply don't care about the data stability, and getting the changed data is perfectly fine). Linus