On Wed, Dec 06, 2023 at 12:47:27PM -0800, Kees Cook wrote: > On Tue, Dec 05, 2023 at 07:01:34PM +0300, Alexey Dobriyan wrote: > > Report available page shifts in arch independent manner, so that > > userspace developers won't have to parse /proc/cpuinfo hunting > > for arch specific strings: > > > > Note! > > > > This is strictly for userspace, if some page size is shutdown due > > to kernel command line option or CPU bug workaround, than is must not > > be reported in aux vector! > > Given Florian in CC, I assume this is something glibc would like to be > using? Please mention this in the commit log. glibc can use it. Main user is libhugetlbfs, I guess: https://github.com/libhugetlbfs/libhugetlbfs/blob/master/hugeutils.c#L915 Loop inside getauxval() can run faster than opendir(). > > x86_64 machine with 1 GiB pages: > > > > 00000030 06 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 > > 00000040 1d 00 00 00 00 00 00 00 00 10 20 40 00 00 00 00 > > > > x86_64 machine with 2 MiB pages only: > > > > 00000030 06 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 > > 00000040 1d 00 00 00 00 00 00 00 00 10 20 00 00 00 00 00 > > > > AT_PAGESZ is always 4096 which is not that interesting. > > That's not always true. For example, see arm64: > arch/arm64/include/asm/elf.h:#define ELF_EXEC_PAGESIZE PAGE_SIZE Yes, I'm x86_64 guy, AT_PAGESZ remark is about x86_64. > I'm not actually sure why x86 forces it to 4096. I'd need to go look > through the history there. > > --- a/arch/x86/include/asm/elf.h > > +++ b/arch/x86/include/asm/elf.h > > @@ -358,6 +358,18 @@ else if (IS_ENABLED(CONFIG_IA32_EMULATION)) \ > > > > #define COMPAT_ELF_ET_DYN_BASE (TASK_UNMAPPED_BASE + 0x1000000) > > > > +#define ARCH_AT_PAGE_SHIFT_MASK \ > > + do { \ > > + u32 val = 1 << 12; \ > > + if (boot_cpu_has(X86_FEATURE_PSE)) { \ > > + val |= 1 << 21; \ > > + } \ > > + if (boot_cpu_has(X86_FEATURE_GBPAGES)) { \ > > + val |= 1 << 30; \ > > + } \ > > + NEW_AUX_ENT(AT_PAGE_SHIFT_MASK, val); \ > > + } while (0) > > + > > #endif /* !CONFIG_X86_32 */ > > Can't we have a generic ARCH_AT_PAGE_SHIFT_MASK too? Something like: > > #ifndef ARCH_AT_PAGE_SHIFT_MASK > #define ARCH_AT_PAGE_SHIFT_MASK > NEW_AUX_ENT(AT_PAGE_SHIFT_MASK, 1 << PAGE_SHIFT) > #endif > > Or am I misunderstanding something here? 1) Arch maintainers can opt into this new way to report information at their own pace. 2) AT_PAGE_SHIFT_MASK is about _all_ pagesizes supported by CPU. Reporting just one is missing the point. I'll clarify comment: mmap() support require many things including tests for hugetlbfs being mounted, this is about CPU support. > > --- a/fs/binfmt_elf.c > > +++ b/fs/binfmt_elf.c > > @@ -240,6 +240,9 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, > > #endif > > NEW_AUX_ENT(AT_HWCAP, ELF_HWCAP); > > NEW_AUX_ENT(AT_PAGESZ, ELF_EXEC_PAGESIZE); > > +#ifdef ARCH_AT_PAGE_SHIFT_MASK > > + ARCH_AT_PAGE_SHIFT_MASK; > > +#endif > > That way we can avoid an #ifdef in the .c file. That's a false economy. ifdefs aren't bad inherently. When all archs implement AT_PAGE_SHIFT_MASK, ifdef will be removed. > > --- a/include/uapi/linux/auxvec.h > > +++ b/include/uapi/linux/auxvec.h > > @@ -33,6 +33,20 @@ > > #define AT_RSEQ_FEATURE_SIZE 27 /* rseq supported feature size */ > > #define AT_RSEQ_ALIGN 28 /* rseq allocation alignment */ > > > > +/* > > + * Page sizes available for mmap(2) encoded as bitmask. > > + * > > + * Example: x86_64 system with pse, pdpe1gb /proc/cpuinfo flags reports > > + * 4 KiB, 2 MiB and 1 GiB page support. > > + * > > + * $ hexdump -C /proc/self/auxv > > FWIW, a more readable form is: $ LD_SHOW_AUXV=1 /bin/true OK. It doesn't show new values as text, but OK. > > + * 00000030 06 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 > > + * 00000040 1d 00 00 00 00 00 00 00 00 10 20 40 00 00 00 00 > > + * > > + * For 2^64 hugepage support please contact your Universe sales representative. > > + */ > > +#define AT_PAGE_SHIFT_MASK 29 > > ... hmm, why is 29 unused? > > > + > > #define AT_EXECFN 31 /* filename of program */ > > > > #ifndef AT_MINSIGSTKSZ > > This will need a man page update for "getauxval" as well... Hear, hear!