On Tue, 30 May 2023 23:26:43 +0100 David Howells wrote: > Interesting. Now that you've pointed me at it, I've tried running it. Mostly > it passes, but I'm having some problems with the multi_chunk_sendfile tests > that time out. I think that splice_direct_to_actor() has a bug. The problem > is this bit of code: > > /* > * If more data is pending, set SPLICE_F_MORE > * If this is the last data and SPLICE_F_MORE was not set > * initially, clears it. > */ > if (read_len < len) > sd->flags |= SPLICE_F_MORE; > else if (!more) > sd->flags &= ~SPLICE_F_MORE; > > When used with sendfile(), it sets SPLICE_F_MORE (which causes MSG_MORE to be > passed to the network protocol) if we haven't yet read everything that the > user requested and clears it if we fulfilled what the user requested. > > This has the weird effect that MSG_MORE gets kind of inverted. It's never > seen by the actor if we can read the entire request into the pipe - except if > we hit the EOF first. If we hit the EOF before we fulfil the entire request, > we get a short read and SPLICE_F_MORE and thus MSG_MORE *is* set. The > upstream TLS code ignores it - but I'm changing this with my patches as > sendmsg() then uses it to mark the EOR. > > I think we probably need to fix this in some way to check the size of source > file - which may not be a regular file:-/ With the attached change, all tests > pass; without it, a bunch of tests fail with timeouts. Yeah.. it's one of those 'known warts' which we worked around in TLS because I don't know enough about VFS to confidently fix it in fs/. Proper fix would be pretty nice to have. The original-original report of the problem is here, FWIW: https://lore.kernel.org/netdev/1591392508-14592-1-git-send-email-pooja.trivedi@xxxxxxxxxxxxx/ And my somewhat hacky fix was d452d48b9f8.