On Tue, Oct 1, 2024 at 2:15 AM Willem de Bruijn <willemdebruijn.kernel@xxxxxxxxx> wrote: > > Jason Xing wrote: > > On Tue, Oct 1, 2024 at 1:14 AM Willem de Bruijn > > <willemdebruijn.kernel@xxxxxxxxx> wrote: > > > > > > Jason Xing wrote: > > > > On Mon, Sep 30, 2024 at 7:49 PM Willem de Bruijn > > > > <willemdebruijn.kernel@xxxxxxxxx> wrote: > > > > > > > > > > Jason Xing wrote: > > > > > > On Mon, Sep 30, 2024 at 6:39 PM Willem de Bruijn > > > > > > <willemdebruijn.kernel@xxxxxxxxx> wrote: > > > > > > > > > > > > > > Jason Xing wrote: > > > > > > > > From: Jason Xing <kernelxing@xxxxxxxxxxx> > > > > > > > > > > > > > > > > Even though this case is unlikely to happen, we have to avoid such > > > > > > > > a case occurring at an earlier point: the sk_rmem_alloc could get > > > > > > > > increased because of inserting more and more skbs into the errqueue > > > > > > > > when calling __skb_complete_tx_timestamp(). This bad case would stop > > > > > > > > the socket transmitting soon. > > > > > > > > > > > > > > It is up to the application to read from the error queue frequently > > > > > > > enough and/or increase SO_RCVBUF. > > > > > > > > > > > > Sure thing. If we test it without setting SOF_TIMESTAMPING_SOFTWARE on > > > > > > the loopback, it will soon stop. That's the reason why I tried to add > > > > > > the restriction just in case. > > > > > > > > > > I don't follow at all. > > > > > > > > > > That bit does not affect the core issue: that the application is not > > > > > clearing its error queue quickly enough. > > > > > > > > > > > > > > > > > > > > Signed-off-by: Jason Xing <kernelxing@xxxxxxxxxxx> > > > > > > > > --- > > > > > > > > net/core/sock.c | 4 ++++ > > > > > > > > 1 file changed, 4 insertions(+) > > > > > > > > > > > > > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > > > > > > > index fe87f9bd8f16..4bddd6f62e4f 100644 > > > > > > > > --- a/net/core/sock.c > > > > > > > > +++ b/net/core/sock.c > > > > > > > > @@ -905,6 +905,10 @@ int sock_set_timestamping(struct sock *sk, int optname, > > > > > > > > if (val & ~SOF_TIMESTAMPING_MASK) > > > > > > > > return -EINVAL; > > > > > > > > > > > > > > > > + if (val & SOF_TIMESTAMPING_TX_RECORD_MASK && > > > > > > > > + !(val & SOF_TIMESTAMPING_SOFTWARE)) > > > > > > > > + return -EINVAL; > > > > > > > > + > > > > > > > > > > > > > > This breaks hardware timestamping > > > > > > > > > > > > Yes, and sorry about that. I'll fix this. > > > > > > > > > > As is I don't understand the purpose of this patch. Please do not > > > > > just resubmit with a change, but explain the problem and suggested > > > > > solution first. > > > > > > > > > > > > > I will drop this patch because I just tested with my program in the > > > > local machine and found there is one mistake I made about calculating > > > > the diff between those two . Sorry for the noise. > > > > > > > > Well, I only need to send a V2 patch of patch [3/3] in the next few days. > > > > > > > > BTW, please allow me to ask one question unrelated to this patch > > > > again. I do wonder: if we batch the recv skbs from the errqueue when > > > > calling tcp_recvmsg() -> inet_recv_error(), it could break users, > > > > right? > > > > > > Analogous to __msg_zerocopy_callback with __msg_zerocopy_callback. > > > > > > Only here we cannot return range-based results and thus cannot just > > > expand the range of the one outstanding notification. > > > > > > This would mean in ip(v6)_recv_error calling sock_dequeue_err_skb, > > > sock_recv_timestamp and put_cmsg IP_RECVERR multiple times. And > > > ip_cmsg_recv if needed. > > > > > > Existing applications do not have to expect multiple results per > > > single recvmsg call. So enabling that unconditionally could break > > > them. > > > > Thanks for your explanation! I was unsure because I read some use > > cases in github and txtimestamp.c, they can only handle one err skb at > > one time. > > > > > > > > Adding this will require a new flag. An sk_tsflag is the obvious > > > approach. > > > > > > Interpreting a MSG_* flag passed to recvmsg would be > > > another option. If there is a bit that can be set with MSG_ERRQUEUE > > > and cannot already be set currently. But I don't think that's the > > > case. We allow all bits and ignore any undefined ones. > > > > Do you feel it is necessary that we can implement this idea to > > optimize it, saving 2 or 3 syscalls at most at one time? IIRC, it's > > you who proposed that we can batch them when applying the tracepoints > > mechanism after I gave a presentation at netconf :) It's really good. > > That inspires me a lot and makes me keep wondering if we can do this > > these days. > > > > Since I've already finished the bpf for timestamping feature locally > > which bypasses receiving skbs from errqueue, > > That's great! > > > I believe it could be > > helpful for those applications that still have tendency to use the > > "traditional way" to trace. > > > > What are your thoughts on this? If you agree, do you want to do this > > on your own or allow me to give it a try? > > I'd focus on the workload that you care about most, which is the > administrator driven interface, which will use BPF. > > This micro optimization would need some benchmarks that show that it > has a measurable effect. Got it. I will post that series soon. Thanks, Jason