On Tue, Sep 08, 2020 at 10:05:11AM -0400, Zi Yan wrote: > On 8 Sep 2020, at 7:57, David Hildenbrand wrote: > > I have concerns if we would silently use 1~GB THPs in most scenarios > > where be would have used 2~MB THP. I'd appreciate a trigger to > > explicitly enable that - MADV_HUGEPAGE is not sufficient because some > > applications relying on that assume that the THP size will be 2~MB > > (especially, if you want sparse, large VMAs). > > This patchset is not intended to silently use 1GB THP in place of 2MB THP. > First of all, there is a knob /sys/kernel/mm/transparent_hugepage/enable_1GB > to enable 1GB THP explicitly. Also, 1GB THP is allocated from a reserved CMA > region (although I had alloc_contig_pages as a fallback, which can be removed > in next version), so users need to add hugepage_cma=nG kernel parameter to > enable 1GB THP allocation. If a finer control is necessary, we can add > a new MADV_HUGEPAGE_1GB for 1GB THP. I think we do need that flag. Machines don't run a single workload (arguably with VMs, we're getting closer to going back to the single workload per machine, but that's a different matter). So if there's one app that wants 2MB pages and one that wants 1GB pages, we need to be able to distinguish them. I could also see there being an app which benefits from 1GB for one mapping and prefers 2GB for a different mapping, so I think the per-mapping madvise flag is best. I'm a little wary of encoding the size of an x86 PUD in the Linux API though. Probably best to follow the example set in include/uapi/asm-generic/hugetlb_encode.h, but I don't love it. I don't have a better suggestion though.