On 3/19/25 9:32 AM, Joe Damato wrote: > On Wed, Mar 19, 2025 at 01:04:48AM -0700, Christoph Hellwig wrote: >> On Wed, Mar 19, 2025 at 12:15:11AM +0000, Joe Damato wrote: >>> One way to fix this is to add zerocopy notifications to sendfile similar >>> to how MSG_ZEROCOPY works with sendmsg. This is possible thanks to the >>> extensive work done by Pavel [1]. >> >> What is a "zerocopy notification" > > See the docs on MSG_ZEROCOPY [1], but in short when a user app calls > sendmsg and passes MSG_ZEROCOPY a completion notification is added > to the error queue. The user app can poll for these to find out when > the TX has completed and the buffer it passed to the kernel can be > overwritten. > > My series provides the same functionality via splice and sendfile2. > > [1]: https://www.kernel.org/doc/html/v6.13/networking/msg_zerocopy.html > >> and why aren't you simply plugging this into io_uring and generate >> a CQE so that it works like all other asynchronous operations? > > I linked to the iouring work that Pavel did in the cover letter. > Please take a look. > > That work refactored the internals of how zerocopy completion > notifications are wired up, allowing other pieces of code to use the > same infrastructure and extend it, if needed. > > My series is using the same internals that iouring (and others) use > to generate zerocopy completion notifications. Unlike iouring, > though, I don't need a fully customized implementation with a new > user API for harvesting completion events; I can use the existing > mechanism already in the kernel that user apps already use for > sendmsg (the error queue, as explained above and in the > MSG_ZEROCOPY documentation). The error queue is arguably a work-around for _not_ having a delivery mechanism that works with a sync syscall in the first place. The main question here imho would be "why add a whole new syscall etc when there's already an existing way to do accomplish this, with free-to-reuse notifications". If the answer is "because splice", then it would seem saner to plumb up those bits only. Would be much simpler too... -- Jens Axboe