On Tue, Apr 16, 2024 at 12:18 AM Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> wrote: > > > > On 2024/4/15 18:52, David Hildenbrand wrote: > > On 15.04.24 10:59, Kefeng Wang wrote: > >> > >> > >> On 2024/4/15 16:18, Barry Song wrote: > >>> On Mon, Apr 15, 2024 at 8:12 PM Kefeng Wang > >>> <wangkefeng.wang@xxxxxxxxxx> wrote: > >>>> > >>>> Both the file pages and anonymous pages support large folio, high-order > >>>> pages except PMD_ORDER will also be allocated frequently which could > >>>> increase the zone lock contention, allow high-order pages on pcp lists > >>>> could reduce the big zone lock contention, but as commit 44042b449872 > >>>> ("mm/page_alloc: allow high-order pages to be stored on the per-cpu > >>>> lists") > >>>> pointed, it may not win in all the scenes, add a new control sysfs to > >>>> enable or disable specified high-order pages stored on PCP lists, > >>>> the order > >>>> (PAGE_ALLOC_COSTLY_ORDER, PMD_ORDER) won't be stored on PCP list by > >>>> default. > >>> > >>> This is precisely something Baolin and I have discussed and intended > >>> to implement[1], > >>> but unfortunately, we haven't had the time to do so. > >> > >> Indeed, same thing. Recently, we are working on unixbench/lmbench > >> optimization, I tested Multi-size THP for anonymous memory by hard-cord > >> PAGE_ALLOC_COSTLY_ORDER from 3 to 4[1], it shows some improvement but > >> not for all cases and not very stable, so re-implemented it by according > >> to the user requirement and enable it dynamically. > > > > I'm wondering, though, if this is really a suitable candidate for a > > sysctl toggle. Can anybody really come up with an educated guess for > > these values? > > Not sure this is suitable in sysctl, but mTHP anon is enabled in sysctl, > we could trace __alloc_pages() and do order statistic to decide to > choose the high-order to be enabled on PCP. > > > > > Especially reading "Benchmarks Score shows a little improvoment(0.28%)" > > and "it may not win in all the scenes", to me it mostly sounds like > > "minimal impact" -- so who cares? > > Even though lock conflicts are eliminated, there is very limited > performance improvement(even maybe fluctuation), it is not a good > testcase to show improvement, just show the zone-lock issue, we need to > find other better testcase, maybe some test on Andriod(heavy use 64K, no > PMD THP), or LKP maybe give some help? > > I will try to find other testcase to show the benefit. Hi Kefeng, I wonder if you will see some major improvements on mTHP 64KiB using the below microbench I wrote just now, for example perf and time to finish the program #define DATA_SIZE (2UL * 1024 * 1024) int main(int argc, char **argv) { /* make 32 concurrent alloc and free of mTHP */ fork(); fork(); fork(); fork(); fork(); for (int i = 0; i < 100000; i++) { void *addr = mmap(NULL, DATA_SIZE, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); if (addr == MAP_FAILED) { perror("fail to malloc"); return -1; } memset(addr, 0x11, DATA_SIZE); munmap(addr, DATA_SIZE); } return 0; } > > > > > How much is the cost vs. benefit of just having one sane system > > configuration? > > > > For arm64 with 4k, five more high-orders(4~8), five more pcplists, > and for high-orders, we assumes most of them are moveable, but maybe > not, so enable it by default maybe more fragmentization, see > 5d0a661d808f ("mm/page_alloc: use only one PCP list for THP-sized > allocations"). >