On Mon, 5 Aug 2019, Al Viro wrote: > On Mon, Aug 05, 2019 at 07:12:55PM +0100, Al Viro wrote: > > On Tue, Aug 06, 2019 at 01:03:06AM +0900, Sergey Senozhatsky wrote: > > > tmpfs does not set ->remount_fs() anymore and its users need > > > to be converted to new mount API. > > > > Could you explain why the devil do you bother with remount at all? > > Why not pass the right options when mounting the damn thing? > > ... and while we are at it, I really wonder what's going on with > that gemfs thing - among the other things, this is the only > user of shmem_file_setup_with_mnt(). Sure, you want your own > options, but that brings another question - is there any reason > for having the huge=... per-superblock rather than per-file? Yes: we want a default for how files of that superblock are to allocate their pages, without people having to fcntl or advise each of their files. Setting aside the weirder options (within_size, advise) and emergency/ testing override (shmem_huge), we want files on an ordinary default tmpfs (huge=never) to be allocated with small pages (so users with access to that filesystem will not consume, and will not waste time and space on consuming, the more valuable huge pages); but files on a huge=always tmpfs to be allocated with huge pages whenever possible. Or am I missing your point? Yes, hugeness can certainly be decided differently per-file, or even per-extent of file. That is already made possible through "judicious" use of madvise MADV_HUGEPAGE and MADV_NOHUGEPAGE on mmaps of the file, carried over from anon THP. Though personally I'm averse to managing "f"objects through "m"interfaces, which can get ridiculous (notably, MADV_HUGEPAGE works on the virtual address of a mapping, but the huge-or-not alignment of that mapping must have been decided previously). In Google we do use fcntls F_HUGEPAGE and F_NOHUGEPAGE to override on a per-file basis - one day I'll get to upstreaming those. Hugh > > After all, the readers of ->huge in mm/shmem.c are > mm/shmem.c:582: (shmem_huge == SHMEM_HUGE_FORCE || sbinfo->huge) && > is_huge_enabled(), sbinfo is an explicit argument > > mm/shmem.c:1799: switch (sbinfo->huge) { > shmem_getpage_gfp(), sbinfo comes from inode > > mm/shmem.c:2113: if (SHMEM_SB(sb)->huge == SHMEM_HUGE_NEVER) > shmem_get_unmapped_area(), sb comes from file > > mm/shmem.c:3531: if (sbinfo->huge) > mm/shmem.c:3532: seq_printf(seq, ",huge=%s", shmem_format_huge(sbinfo->huge)); > ->show_options() > mm/shmem.c:3880: switch (sbinfo->huge) { > shmem_huge_enabled(), sbinfo comes from an inode > > And the only caller of is_huge_enabled() is shmem_getattr(), with sbinfo > picked from inode. > > So is there any reason why the hugepage policy can't be per-file, with > the current being overridable default?