On 10/12/20 10:39 AM, Muchun Song wrote: > On Mon, Oct 12, 2020 at 3:42 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote: >> >> On Mon, Oct 12, 2020 at 6:22 AM Muchun Song <songmuchun@xxxxxxxxxxxxx> wrote: >>> >>> On Mon, Oct 12, 2020 at 2:39 AM Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote: >>>> >>>> On Sat, Oct 10, 2020 at 3:39 AM Muchun Song <songmuchun@xxxxxxxxxxxxx> wrote: >>>>> >>>>> The amount of memory allocated to sockets buffer can become significant. >>>>> However, we do not display the amount of memory consumed by sockets >>>>> buffer. In this case, knowing where the memory is consumed by the kernel >>>> >>>> We do it via `ss -m`. Is it not sufficient? And if not, why not adding it there >>>> rather than /proc/meminfo? >>> >>> If the system has little free memory, we can know where the memory is via >>> /proc/meminfo. If a lot of memory is consumed by socket buffer, we cannot >>> know it when the Sock is not shown in the /proc/meminfo. If the unaware user >>> can't think of the socket buffer, naturally they will not `ss -m`. The >>> end result >>> is that we still don’t know where the memory is consumed. And we add the >>> Sock to the /proc/meminfo just like the memcg does('sock' item in the cgroup >>> v2 memory.stat). So I think that adding to /proc/meminfo is sufficient. >>> >>>> >>>>> static inline void __skb_frag_unref(skb_frag_t *frag) >>>>> { >>>>> - put_page(skb_frag_page(frag)); >>>>> + struct page *page = skb_frag_page(frag); >>>>> + >>>>> + if (put_page_testzero(page)) { >>>>> + dec_sock_node_page_state(page); >>>>> + __put_page(page); >>>>> + } >>>>> } >>>> >>>> You mix socket page frag with skb frag at least, not sure this is exactly >>>> what you want, because clearly skb page frags are frequently used >>>> by network drivers rather than sockets. >>>> >>>> Also, which one matches this dec_sock_node_page_state()? Clearly >>>> not skb_fill_page_desc() or __skb_frag_ref(). >>> >>> Yeah, we call inc_sock_node_page_state() in the skb_page_frag_refill(). >>> So if someone gets the page returned by skb_page_frag_refill(), it must >>> put the page via __skb_frag_unref()/skb_frag_unref(). We use PG_private >>> to indicate that we need to dec the node page state when the refcount of >>> page reaches zero. >>> >> >> Pages can be transferred from pipe to socket, socket to pipe (splice() >> and zerocopy friends...) >> >> If you want to track TCP memory allocations, you always can look at >> /proc/net/sockstat, >> without adding yet another expensive memory accounting. > > The 'mem' item in the /proc/net/sockstat does not represent real > memory usage. This is just the total amount of charged memory. > > For example, if a task sends a 10-byte message, it only charges one > page to memcg. But the system may allocate 8 pages. Therefore, it > does not truly reflect the memory allocated by the above memory > allocation path. We can see the difference via the following message. > > cat /proc/net/sockstat > sockets: used 698 > TCP: inuse 70 orphan 0 tw 617 alloc 134 mem 13 > UDP: inuse 90 mem 4 > UDPLITE: inuse 0 > RAW: inuse 1 > FRAG: inuse 0 memory 0 > > cat /proc/meminfo | grep Sock > Sock: 13664 kB > > The /proc/net/sockstat only shows us that there are 17*4 kB TCP > memory allocations. But apply this patch, we can see that we truly > allocate 13664 kB(May be greater than this value because of per-cpu > stat cache). Of course the load of the example here is not high. In > some high load cases, I believe the difference here will be even > greater. > This is great, but you have not addressed my feedback. TCP memory allocations are bounded by /proc/sys/net/ipv4/tcp_mem Fact that the memory is forward allocated or not is a detail. If you think we must pre-allocate memory, instead of forward allocations, your patch does not address this. Adding one line per consumer in /proc/meminfo looks wrong to me. If you do not want 9.37 % of physical memory being possibly used by TCP, just change /proc/sys/net/ipv4/tcp_mem accordingly ?