On Tue, Dec 5, 2023 at 3:11 AM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote: > > On Mon, 4 Dec 2023 13:28:21 +0100, Eric Dumazet <edumazet@xxxxxxxxxx> wrote: > > On Mon, Dec 4, 2023 at 12:43 PM Philo Lu <lulie@xxxxxxxxxxxxxxxxx> wrote: > > > > > > Add 3 tracepoints, namely tcp_data_send/tcp_data_recv/tcp_data_acked, > > > which will be called every time a tcp data packet is sent, received, and > > > acked. > > > tcp_data_send: called after a data packet is sent. > > > tcp_data_recv: called after a data packet is receviced. > > > tcp_data_acked: called after a valid ack packet is processed (some sent > > > data are ackknowledged). > > > > > > We use these callbacks for fine-grained tcp monitoring, which collects > > > and analyses every tcp request/response event information. The whole > > > system has been described in SIGMOD'18 (see > > > https://dl.acm.org/doi/pdf/10.1145/3183713.3190659 for details). To > > > achieve this with bpf, we require hooks for data events that call bpf > > > prog (1) when any data packet is sent/received/acked, and (2) after > > > critical tcp state variables have been updated (e.g., snd_una, snd_nxt, > > > rcv_nxt). However, existing bpf hooks cannot meet our requirements. > > > Besides, these tracepoints help to debug tcp when data send/recv/acked. > > > > This I do not understand. > > > > > > > > Though kretprobe/fexit can also be used to collect these information, > > > they will not work if the kernel functions get inlined. Considering the > > > stability, we prefer tracepoint as the solution. > > > > I dunno, this seems quite weak to me. I see many patches coming to add > > tracing in the stack, but no patches fixing any issues. > > > We have implemented a mechanism to split the request and response from the TCP > connection using these "hookers", which can handle various protocols such as > HTTP, HTTPS, Redis, and MySQL. This mechanism allows us to record important > information about each request and response, including the amount of data > uploaded, the time taken by the server to handle the request, and the time taken > for the client to receive the response. This mechanism has been running > internally for many years and has proven to be very useful. > > One of the main benefits of this mechanism is that it helps in locating the > source of any issues or problems that may arise. For example, if there is a > problem with the network, the application, or the machine, we can use this > mechanism to identify and isolate the issue. > > TCP has long been a challenge when it comes to tracking the transmission of data > on the network. The application can only confirm that it has sent a certain > amount of data to the kernel, but it has limited visibility into whether the > client has actually received this data. Our mechanism addresses this issue by > providing insights into the amount of data received by the client and the time > it was received. Furthermore, we can also detect any packet loss or delays > caused by the server. > > https://help-static-aliyun-doc.aliyuncs.com/assets/img/zh-CN/7912288961/9732df025beny.svg > > So, we do not want to add some tracepoint to do some unknow debug. > We have a clear goal. debugging is just an incidental capability. > We have powerful mechanisms in the stack already that ordinary (no privilege requested) applications can readily use. We have been using them for a while. If existing mechanisms are missing something you need, please expand them. For reference, start looking at tcp_get_timestamping_opt_stats() history. Sender side can for instance get precise timestamps. Combinations of these timestamps reveal different parts of the overall network latency, T0: sendmsg() enters TCP T1: first byte enters qdisc T2: first byte sent to the NIC T3: first byte ACKed in TCP T4: last byte sent to the NIC T5: last byte ACKed T1 - T0: how long the first byte was blocked in the TCP layer ("Head of Line Blocking" latency). T2 - T1: how long the first byte was blocked in the Linux traffic shaping layer (known as QDisc). T3 - T2: the network ‘distance’ (propagation delay + current queuing delay along the network path and at the receiver). T5 - T2: how fast the sent chunk was delivered. Message Size / (T5 - T0): goodput (from application’s perspective)