On 9/22/23 08:48, Ryan Roberts wrote: ...
I never had any feedback on the below; I'm not sure if that means everyone is happy or that nobody read it??
One can never really know: zero or more people read it, and of those, no one hated it enough to send out a quick NAK. So that's a *possible*, lukewarm endorsement of sorts. Success! :) ...
BUT I've had yet another idea on the controls front, which would enable exposing this to user space as an extension to transparent_hugepage, while continuing to support THP as is and also be able to control THP and ALF (anon large folio)
The new ALF / ANON_LARGE_FOLIO naming looks good to me. The grep aspect is a nice touch. ...
Add 2 controls to sysfs: /sys/kernel/mm/transparent_hugepage/anon_orders - bitfield where set bits are orders that will be tried during allocation - defaults to 1<<PMD_ORDER, which gives current THP behaviour with no ALF - For now, 1<<PMD_ORDER is highest settable bit, but easy to expand in future - To enable ALF, set the appropriate lower bits - To disable THP, clear 1<<PMD_ORDER - (In future we could add an "auto" option too) /sys/kernel/mm/transparent_hugepage/anon_always_mask - orders in (anon_orders & anon_always_mask) are not subject to madvise - so when enabled=madvise, still try (anon_orders & anon_always_mask) orders as if enabled=always - defaults to 0 (all subject to madvise)
I *think* I like this a lot, although I have some clarifying question below. It seems to address the key things that have been complicating the discussions: the API is now looking more flexible, and yet still easy to understand and reason about. Nice. A couple of questions about how this works:
The defaults for those controls give you "legacy THP". But you can modify the controls to generate policies like this:
For these tables, a small key or legend would help. I've forgotten already what "S" means, and am also vague about exactly what "THP>ALF>S" behavior means, too.
THP only - existing behaviour (default): ---------------------------------------- anon_orders = 1<<PMD_ORDER anon_always_mask = 0 thp prctl: | dis | ena | ena | ena
All I see in the prctl(2) man page is PR_SET_THP_DISABLE, I don't see any _ENABLE. What does the above refer to?
thp sysfs: | X | never | madvise | always ----------------------|-----------|-----------|-----------|------------- no hint | S | S | S | THP>S MADV_HUGEPAGE | S | S | THP>S | THP>S MADV_NOHUGEPAGE | S | S | S | S
...
It does have the disadvantage that ALF is tied to MADV_HUGEPAGE, whereas the
Right, that is a little awkward. But maybe less so now, with this new proposal, which leaves THP a little closer to ALF. thanks, -- John Hubbard NVIDIA