On Wed, Feb 15, 2017 at 8:42 AM, Tariq Toukan <tariqt@xxxxxxxxxxxx> wrote: > > > Isn't it the same principle in page_frag_alloc() ? > It is called form __netdev_alloc_skb()/__napi_alloc_skb(). > > Why is it ok to have order-3 pages (PAGE_FRAG_CACHE_MAX_ORDER) there? This is not ok. This is a very well known problem, we already mentioned that here in the past, but at least core networking stack uses order-0 pages on PowerPC. mlx4 driver suffers from this problem 100% more than other drivers ;) One problem at a time Tariq. Right now, only mlx4 has this big problem compared to other NIC. Then, if we _still_ hit major issues, we might also need to force napi_get_frags() to allocate skb->head using kmalloc() instead of a page frag. That is a very simple fix. Remember that we have skb->truesize that is an approximation, it will never be completely accurate, but we need to make it better. mlx4 driver pretends to have a frag truesize of 1536 bytes, but this is obviously wrong when host is under memory pressure (2 frags per page -> truesize should be 2048) > By using netdev/napi_alloc_skb, you'll get that the SKB's linear data is a > frag of a huge page, > and it is not going to be freed before the other non-linear frags. > Cannot this cause the same threats (memory pinning and so...)? > > Currently, mlx4 doesn't use this generic API, while most other drivers do. > > Similar claims are true for TX: > https://github.com/torvalds/linux/commit/5640f7685831e088fe6c2e1f863a6805962f8e81 We do not have such problem on TX. GFP_KERNEL allocations do not have the same issues. Tasks are usually not malicious in our DC, and most serious applications use memcg or such memory control. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>