On Mon, May 22, 2023 at 07:56 PM -07, John Fastabend wrote: > The read_skb() logic is incrementing the tcp->copied_seq which is used for > among other things calculating how many outstanding bytes can be read by > the application. This results in application errors, if the application > does an ioctl(FIONREAD) we return zero because this is calculated from > the copied_seq value. > > To fix this we move tcp->copied_seq accounting into the recv handler so > that we update these when the recvmsg() hook is called and data is in > fact copied into user buffers. This gives an accurate FIONREAD value > as expected and improves ACK handling. Before we were calling the > tcp_rcv_space_adjust() which would update 'number of bytes copied to > user in last RTT' which is wrong for programs returning SK_PASS. The > bytes are only copied to the user when recvmsg is handled. > > Doing the fix for recvmsg is straightforward, but fixing redirect and > SK_DROP pkts is a bit tricker. Build a tcp_psock_eat() helper and then > call this from skmsg handlers. This fixes another issue where a broken > socket with a BPF program doing a resubmit could hang the receiver. This > happened because although read_skb() consumed the skb through sock_drop() > it did not update the copied_seq. Now if a single reccv socket is > redirecting to many sockets (for example for lb) the receiver sk will be > hung even though we might expect it to continue. The hang comes from > not updating the copied_seq numbers and memory pressure resulting from > that. > > We have a slight layer problem of calling tcp_eat_skb even if its not > a TCP socket. To fix we could refactor and create per type receiver > handlers. I decided this is more work than we want in the fix and we > already have some small tweaks depending on caller that use the > helper skb_bpf_strparser(). So we extend that a bit and always set > the strparser bit when it is in use and then we can gate the > seq_copied updates on this. > > Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") > Signed-off-by: John Fastabend <john.fastabend@xxxxxxxxx> > --- Reviewed-by: Jakub Sitnicki <jakub@xxxxxxxxxxxxxx>