On Tue, Sep 08, 2020 at 03:27:58PM +0100, Matthew Wilcox wrote: > On Tue, Sep 08, 2020 at 10:05:11AM -0400, Zi Yan wrote: > > On 8 Sep 2020, at 7:57, David Hildenbrand wrote: > > > I have concerns if we would silently use 1~GB THPs in most scenarios > > > where be would have used 2~MB THP. I'd appreciate a trigger to > > > explicitly enable that - MADV_HUGEPAGE is not sufficient because some > > > applications relying on that assume that the THP size will be 2~MB > > > (especially, if you want sparse, large VMAs). > > > > This patchset is not intended to silently use 1GB THP in place of 2MB THP. > > First of all, there is a knob /sys/kernel/mm/transparent_hugepage/enable_1GB > > to enable 1GB THP explicitly. Also, 1GB THP is allocated from a reserved CMA > > region (although I had alloc_contig_pages as a fallback, which can be removed > > in next version), so users need to add hugepage_cma=nG kernel parameter to > > enable 1GB THP allocation. If a finer control is necessary, we can add > > a new MADV_HUGEPAGE_1GB for 1GB THP. > > I think we do need that flag. Machines don't run a single workload > (arguably with VMs, we're getting closer to going back to the single > workload per machine, but that's a different matter). So if there's > one app that wants 2MB pages and one that wants 1GB pages, we need to > be able to distinguish them. > > I could also see there being an app which benefits from 1GB for > one mapping and prefers 2GB for a different mapping, so I think the > per-mapping madvise flag is best. I wonder if apps really care about the specific page size? Particularly from a portability view? The general app desire seems to be the need for 'efficient' memory (eg because it is highly accessed) and I suspect comes with a desire to populate the pages too. Maybe doing something with MAP_POPULATE is an idea? eg if I ask for 1GB of MAP_POPULATE it seems fairly natural the thing that comes back should be a 1GB THP? If I ask for only .5GB then it could be 2M pages, or whatever depending on arch support. Jason