On Mon, Nov 18, 2024 at 2:27 AM Barry Song <21cnbao@xxxxxxxxx> wrote: > Thanks for the data, Barry and Tangquan! > On Tue, Nov 12, 2024 at 10:37 AM Barry Song <21cnbao@xxxxxxxxx> wrote: > > Thus, "swap-in(ms) 68660," where mTHP allocation always fails, is significantly > slower than "swap-in(ms) 21763," where mTHP allocation succeeds. As well as the first scenario (the status quo) :( I guess it depends on how often we are seeing this degenerate case (i.e how often do we see (m)THP allocation failure?) > > If there are no objections, I could send a v3 patch to fall back to 4 > small folios > instead of one. However, this would significantly increase the complexity of > do_swap_page(). My gut feeling is that the added complexity might not be > well-received :-) Yeah I'm curious too. I'll wait for your numbers - the dynamics are completely unpredictable to me. OTOH, we'll be less wasteful in terms of CPU work (no longer have to decompress the same chunk multiple times). OTOH, we're creating more memory pressure (having to load the whole chunk in), without the THP benefits. I think this is an OK workaround for now. Increasing (m)THP allocation success rate would be the true fix, but that is a hard problem :) > > Thanks > Barry