On Thu, Oct 22, 2020 at 11:40:53PM -0400, Rik van Riel wrote: > On Thu, 2020-10-22 at 19:54 -0700, Hugh Dickins wrote: > > Michal is right to remember pushback before, because tmpfs is a > > filesystem, and "huge=" is a mount option: in using a huge=always > > filesystem, the user has already declared a preference for huge > > pages. > > Whereas the original anon THP had to deduce that preference from sys > > tunables and vma madvice. > > ... > > > But it's likely that they have accumulated some defrag wisdom, which > > tmpfs can take on board - but please accept that in using a huge > > mount, > > the preference for huge has already been expressed, so I don't expect > > anon THP alloc_hugepage_direct_gfpmask() choices will map one to one. > > In my mind, the huge= mount options for tmpfs corresponded > to the "enabled" anon THP options, denoting a desired end > state, not necessarily how much we will stall allocations > to get there immediately. > > The underlying allocation behavior has been changed repeatedly, > with changes to the direct reclaim code and the compaction > deferral code. > > The shmem THP gfp_mask never tried really hard anyway, > with __GFP_NORETRY being the default, which matches what > is used for non-VM_HUGEPAGE anon VMAs. > > Likewise, the direct reclaim done from the opportunistic > THP allocations done by the shmem code limited itself to > reclaiming 32 4kB pages per THP allocation. > > In other words, mounting > with huge=always has never behaved > the same as the more aggressive allocations done for > MADV_HUGEPAGE VMAs. > > This patch would leave shmem THP allocations for non-MADV_HUGEPAGE > mapped files opportunistic like today, and make shmem THP > allocations for files mapped with MADV_HUGEPAGE more aggressive > than today. > > However, I would like to know what people think the shmem > huge= mount options should do, and how things should behave > when memory gets low, before pushing in a patch just because > it makes the system run smoother "without changing current > behavior too much". > > What do people want tmpfs THP allocations to do? I'm also interested for non-tmpfs THP allocations. In my patchset, THPs are no longer limited to being PMD sized, and allocating smaller pages isn't such a tax on the VM. So currently I'm doing: gfp_t gfp = readahead_gfp_mask(mapping); ... struct page *page = __page_cache_alloc_order(gfp, order); which translates to: mapping_gfp_mask(mapping) | __GFP_NORETRY | __GFP_NOWARN; gfp |= GFP_TRANSHUGE_LIGHT; gfp &= ~__GFP_DIRECT_RECLAIM; Everything's very willing to fall back to order-0 pages, but I can see that, eg, for a VM_HUGEPAGE vma, we should perhaps be less willing to fall back to small pages. I would prefer not to add a mount option to every filesystem. People will only get it wrong.