On 12/11/2024 13.22, Yunsheng Lin wrote:
On 2024/11/12 2:51, Toke Høiland-Jørgensen wrote:
...
Is there any other suggestion/concern about how to fix the problem here?
From the previous discussion, it seems the main concern about tracking the
inflight pages is about how many inflight pages it is needed.
Yeah, my hardest objection was against putting a hard limit on the
number of outstanding pages.
If there is no other suggestion/concern , it seems the above concern might be
addressed by using pre-allocated memory to satisfy the mostly used case, and
use the dynamically allocated memory if/when necessary.
For this, my biggest concern would be performance.
In general, doing extra work in rarely used code paths (such as device
teardown) is much preferred to adding extra tracking in the fast path.
Which would be an argument for Alexander's suggestion of just scanning
the entire system page table to find pages to unmap. Don't know enough
about mm system internals to have an opinion on whether this is
feasible, though.
Yes, there seems to be many MM system internals, like the CONFIG_SPARSEMEM*
config, memory offline/online and other MM specific optimization that it
is hard to tell it is feasible.
It would be good if MM experts can clarify on this.
Yes, please. Can Alex Duyck or MM-experts point me at some code walking
entire system page table?
Then I'll write some kernel code (maybe module) that I can benchmark how
long it takes on my machine with 384GiB. I do like Alex'es suggestion,
but I want to assess the overhead of doing this on modern hardware.
In any case, we'll need some numbers to really judge the overhead in
practice. So benchmarking would be the logical next step in any case :)
Using POC code show that using the dynamic memory allocation does not
seems to be adding much overhead than the pre-allocated memory allocation
in this patch, the overhead is about 10~20ns, which seems to be similar to
the overhead of added overhead in the patch.
Overhead around 10~20ns is too large for page_pool, because XDP DDoS
use-case have a very small time budget (which is what page_pool was
designed for).
[1]
https://github.com/xdp-project/xdp-project/blob/master/areas/hints/traits01_bench_kmod.org#benchmark-basics
| Link speed | Packet rate | Time-budget |
| | at smallest pkts size | per packet |
|------------+-----------------------+---------------|
| 10 Gbit/s | 14,880,952 pps | 67.2 nanosec |
| 25 Gbit/s | 37,202,381 pps | 26.88 nanosec |
| 100 Gbit/s | 148,809,523 pps | 6.72 nanosec |
--Jesper