On Sat 21-04-18 00:33:59, Yang Shi wrote: > Since tmpfs THP was supported in 4.8, hugetlbfs is not the only > filesystem with huge page support anymore. tmpfs can use huge page via > THP when mounting by "huge=" mount option. > > When applications use huge page on hugetlbfs, it just need check the > filesystem magic number, but it is not enough for tmpfs. Make > stat.st_blksize return huge page size if it is mounted by appropriate > "huge=" option. > > Some applications could benefit from this change, for example QEMU. > When use mmap file as guest VM backend memory, QEMU typically mmap the > file size plus one extra page. If the file is on hugetlbfs the extra > page is huge page size (i.e. 2MB), but it is still 4KB on tmpfs even > though THP is enabled. tmpfs THP requires VMA is huge page aligned, so > if 4KB page is used THP will not be used at all. The below /proc/meminfo > fragment shows the THP use of QEMU with 4K page: > > ShmemHugePages: 679936 kB > ShmemPmdMapped: 0 kB > > By reading st_blksize, tmpfs can use huge page, then /proc/meminfo looks > like: > > ShmemHugePages: 77824 kB > ShmemPmdMapped: 6144 kB > > statfs.f_bsize still returns 4KB for tmpfs since THP could be split, and it > also may fallback to 4KB page silently if there is not enough huge page. > Furthermore, different f_bsize makes max_blocks and free_blocks > calculation harder but without too much benefit. Returning huge page > size via stat.st_blksize sounds good enough. I am not sure I understand the above. So does QEMU or other tmpfs users rely on f_bsize to do mmap alignment tricks? Also I thought that THP will be used on the first aligned address even when the initial/last portion of the mapping is not THP aligned. And more importantly [...] > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -39,6 +39,7 @@ > #include <asm/tlbflush.h> /* for arch/microblaze update_mmu_cache() */ > > static struct vfsmount *shm_mnt; > +static bool is_huge = false; > > #ifdef CONFIG_SHMEM > /* > @@ -995,6 +996,8 @@ static int shmem_getattr(const struct path *path, struct kstat *stat, > spin_unlock_irq(&info->lock); > } > generic_fillattr(inode, stat); > + if (is_huge) > + stat->blksize = HPAGE_PMD_SIZE; > return 0; > } > > @@ -3574,6 +3577,7 @@ static int shmem_parse_options(char *options, struct shmem_sb_info *sbinfo, > huge != SHMEM_HUGE_NEVER) > goto bad_val; > sbinfo->huge = huge; > + is_huge = true; Huh! How come this is a global flag. What if we have multiple shmem mounts some with huge pages enabled and some without? Btw. we seem to already have that information stored in the supperblock } else if (!strcmp(this_char, "huge")) { int huge; huge = shmem_parse_huge(value); if (huge < 0) goto bad_val; if (!has_transparent_hugepage() && huge != SHMEM_HUGE_NEVER) goto bad_val; sbinfo->huge = huge; -- Michal Hocko SUSE Labs