On Mon, Oct 12, 2020 at 5:24 PM Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote: > > > > On 10/12/20 10:39 AM, Muchun Song wrote: > > On Mon, Oct 12, 2020 at 3:42 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote: > >> > >> On Mon, Oct 12, 2020 at 6:22 AM Muchun Song <songmuchun@xxxxxxxxxxxxx> wrote: > >>> > >>> On Mon, Oct 12, 2020 at 2:39 AM Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote: > >>>> > >>>> On Sat, Oct 10, 2020 at 3:39 AM Muchun Song <songmuchun@xxxxxxxxxxxxx> wrote: > >>>>> > >>>>> The amount of memory allocated to sockets buffer can become significant. > >>>>> However, we do not display the amount of memory consumed by sockets > >>>>> buffer. In this case, knowing where the memory is consumed by the kernel > >>>> > >>>> We do it via `ss -m`. Is it not sufficient? And if not, why not adding it there > >>>> rather than /proc/meminfo? > >>> > >>> If the system has little free memory, we can know where the memory is via > >>> /proc/meminfo. If a lot of memory is consumed by socket buffer, we cannot > >>> know it when the Sock is not shown in the /proc/meminfo. If the unaware user > >>> can't think of the socket buffer, naturally they will not `ss -m`. The > >>> end result > >>> is that we still don’t know where the memory is consumed. And we add the > >>> Sock to the /proc/meminfo just like the memcg does('sock' item in the cgroup > >>> v2 memory.stat). So I think that adding to /proc/meminfo is sufficient. > >>> > >>>> > >>>>> static inline void __skb_frag_unref(skb_frag_t *frag) > >>>>> { > >>>>> - put_page(skb_frag_page(frag)); > >>>>> + struct page *page = skb_frag_page(frag); > >>>>> + > >>>>> + if (put_page_testzero(page)) { > >>>>> + dec_sock_node_page_state(page); > >>>>> + __put_page(page); > >>>>> + } > >>>>> } > >>>> > >>>> You mix socket page frag with skb frag at least, not sure this is exactly > >>>> what you want, because clearly skb page frags are frequently used > >>>> by network drivers rather than sockets. > >>>> > >>>> Also, which one matches this dec_sock_node_page_state()? Clearly > >>>> not skb_fill_page_desc() or __skb_frag_ref(). > >>> > >>> Yeah, we call inc_sock_node_page_state() in the skb_page_frag_refill(). > >>> So if someone gets the page returned by skb_page_frag_refill(), it must > >>> put the page via __skb_frag_unref()/skb_frag_unref(). We use PG_private > >>> to indicate that we need to dec the node page state when the refcount of > >>> page reaches zero. > >>> > >> > >> Pages can be transferred from pipe to socket, socket to pipe (splice() > >> and zerocopy friends...) > >> > >> If you want to track TCP memory allocations, you always can look at > >> /proc/net/sockstat, > >> without adding yet another expensive memory accounting. > > > > The 'mem' item in the /proc/net/sockstat does not represent real > > memory usage. This is just the total amount of charged memory. > > > > For example, if a task sends a 10-byte message, it only charges one > > page to memcg. But the system may allocate 8 pages. Therefore, it > > does not truly reflect the memory allocated by the above memory > > allocation path. We can see the difference via the following message. > > > > cat /proc/net/sockstat > > sockets: used 698 > > TCP: inuse 70 orphan 0 tw 617 alloc 134 mem 13 > > UDP: inuse 90 mem 4 > > UDPLITE: inuse 0 > > RAW: inuse 1 > > FRAG: inuse 0 memory 0 > > > > cat /proc/meminfo | grep Sock > > Sock: 13664 kB > > > > The /proc/net/sockstat only shows us that there are 17*4 kB TCP > > memory allocations. But apply this patch, we can see that we truly > > allocate 13664 kB(May be greater than this value because of per-cpu > > stat cache). Of course the load of the example here is not high. In > > some high load cases, I believe the difference here will be even > > greater. > > > > This is great, but you have not addressed my feedback. > > TCP memory allocations are bounded by /proc/sys/net/ipv4/tcp_mem > > Fact that the memory is forward allocated or not is a detail. > > If you think we must pre-allocate memory, instead of forward allocations, > your patch does not address this. Adding one line per consumer in /proc/meminfo looks > wrong to me. I think that the consumer which consumes a lot of memory should be added to the /proc/meminfo. This can help us know the user of large memory. > > If you do not want 9.37 % of physical memory being possibly used by TCP, > just change /proc/sys/net/ipv4/tcp_mem accordingly ? We are not complaining about TCP using too much memory, but how do we know that TCP uses a lot of memory. When I firstly face this problem, I do not know who uses the 25GB memory and it is not shown in the /proc/meminfo. If we can know the amount memory of the socket buffer via /proc/meminfo, we may not need to spend a lot of time troubleshooting this problem. Not everyone knows that a lot of memory may be used here. But I believe many people should know /proc/meminfo to confirm memory users. Thanks. > > -- Yours, Muchun