On Fri, Mar 17, 2023 at 8:20 AM Jakub Kicinski <kuba@xxxxxxxxxx> wrote: > > On Wed, 15 Mar 2023 17:20:41 +0800 Jason Xing wrote: > > In our production environment, there're hundreds of machines hitting the > > old time_squeeze limit often from which we cannot tell what exactly causes > > such issues. Hitting limits aranged from 400 to 2000 times per second, > > Especially, when users are running on the guest OS with veth policy > > configured, it is relatively easier to hit the limit. After several tries > > without this patch, I found it is only real time_squeeze not including > > budget_squeeze that hinders the receive process. > [...] > That is the common case, and can be understood from the napi trace Thanks for your reply. It is commonly happening every day on many servers. > point and probing the kernel with bpftrace. We should only add We probably can deduce (or guess) which one causes the latency because trace_napi_poll() only counts the budget consumed per poll. Besides, tracing napi poll is totally ok with the testbed but not ok with those servers with heavy load which bpftrace related tools capturing the data from the hot path may cause some bad impact, especially with special cards equipped, say, 100G nic card. Resorting to legacy file softnet_stat is relatively feasible based on my limited knowledge. Paolo also added backlog queues into this file in 2020 (see commit: 7d58e6555870d). I believe that after this patch, there are few or no more new data that is needed to print for the next few years. > uAPI for statistics which must be maintained contiguously. For In this patch, I didn't touch the old data as suggested in the previous emails and only separated the old way of counting @time_squeeze into two parts (time_squeeze and budget_squeeze). Using budget_squeeze can help us profile the server and tune it more usefully. > investigations tracing will always be orders of magnitude more > powerful :( > > On the time squeeze BTW, have you found out what the problem was? > In workloads I've seen the time problems are often because of noise > in how jiffies are accounted (cgroup code disables interrupts > for long periods of time, for example, making jiffies increment > by 2, 3 or 4 rather than by 1). Yes ! The issue of jiffies increment troubles those servers more often than not. For a small group of servers, budget limit is also a problem. Sometimes we might treat guest OS differently. Thanks, Jason > > > So when we encounter some related performance issue and then get lost on > > how to tune the budget limit and time limit in net_rx_action() function, > > we can separately counting both of them to avoid the confusion.