On Thu, May 26, 2022 at 11:30 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Wed, May 25, 2022 at 10:24:55AM +0200, Michal Hocko wrote: > > I am not so sure about the global "never" policy, though. The global > > policy controls _kernel_ driven THPs. As the request to collapse memory > > comes from the userspace I do not think it should be limited by the > > kernel policy. I also think it can be beneficial to implement userspace > > based THP policies and exclude any kernel interference and that could be > > achieved by global kernel "never" policy and implement the whole > > functionality by process_madvise. > > I'd prefer to see "never" mean "Don't run khugepaged" rather than "Do > not create THPs". If the app explicitly asks for a THP, I think it > should get one, regardless of the sysadmin's will. If we want to decouple THP allocation and khugepaged, maybe a dedicated switch for khugepaged? Just like /sys/kernel/mm/ksm/run? Or I should have not proposed a new knob :-) > > Death to tunables. Can we just delete > /sys/kernel/mm/transparent_hugepage/shmem_enabled entirely? It is used to control non-mount shm objects, for example, memfd, sys v shm. The tmpfs has mount options that control huge page eligibility. Consolidate to /sys/kernel/mm/transparent_hugepage/enabled? Maybe, but shmem_enabled has a couple of special modes: - within_size: only allocate huge pages if the page will be fully within i_size - force: enable THP for all mount tmpfs and non-mount shm - deny: do opposite of force force and deny are basically used for debugging purposes. BTW, currently file THP (readonly fs) is actually controlled by /sys/kernel/mm/transparent_hugepage/enabled since it just can be created by khugepaged for now.