On Sat, 31 Jul 2021, Hugh Dickins wrote: > On Fri, 30 Jul 2021, Yang Shi wrote: > > On Fri, Jul 30, 2021 at 12:42 AM Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > > > > > Extend shmem_huge_enabled(vma) to shmem_is_huge(vma, inode, index), so > > > that a consistent set of checks can be applied, even when the inode is > > > accessed through read/write syscalls (with NULL vma) instead of mmaps > > > (the index argument is seldom of interest, but required by mount option > > > "huge=within_size"). Clean up and rearrange the checks a little. > > > > > > This then replaces the checks which shmem_fault() and shmem_getpage_gfp() > > > were making, and eliminates the SGP_HUGE and SGP_NOHUGE modes: while it's > > > still true that khugepaged's collapse_file() at that point wants a small > > > page, the race that might allocate it a huge page is too unlikely to be > > > worth optimizing against (we are there *because* there was at least one > > > small page in the way), and handled by a later PageTransCompound check. > > > > Yes, it seems too unlikely. But if it happens the PageTransCompound > > check may be not good enough since the page allocated by > > shmem_getpage() may be charged to wrong memcg (root memcg). And it > > won't be replaced by a newly allocated huge page so the wrong charge > > can't be undone. > > Good point on the memcg charge: I hadn't thought of that. Of course > it's not specific to SGP_CACHE versus SGP_NOHUGE (this patch), but I > admit that a huge mischarge is hugely worse than a small mischarge. Stupid me (and maybe I haven't given this enough consideration yet): but, much better than SGP_NOHUGE, much better than SGP_CACHE, would be SGP_READ there, wouldn't it? Needs to beware of the NULL too, of course. Hugh