Asking Arnd and others below: should noMMU arches have a good ZERO_PAGE? On Tue, 5 Apr 2022, Hugh Dickins wrote: > On Tue, 5 Apr 2022, Patrice CHOTARD wrote: > > > > We found an issue with last kernel tag v5.18-rc1 on stm32f746-disco and > > stm32h743-disco boards (ARMV7-M SoCs). > > > > Kernel hangs when executing SetPageUptodate(ZERO_PAGE(0)); in mm/filemap.c. > > > > By reverting commit 56a8c8eb1eaf ("tmpfs: do not allocate pages on read"), > > kernel boots without any issue. > > Sorry about that, thanks a lot for finding. > > I see that arch/arm/configs/stm32_defconfig says CONFIG_MMU is not set: > please confirm that is the case here. > > Yes, it looks as if NOMMU platforms are liable to have a bogus (that's my > reading, but it may be unfair) definition for ZERO_PAGE(vaddr), and I was > walking on ice to touch it without regard for !CONFIG_MMU. > > CONFIG_SHMEM depends on CONFIG_MMU, so that PageUptodate is only needed > when CONFIG_MMU. > > Easily fixed by an #ifdef CONFIG_MMU there in mm/filemap.c, but I'll hunt > around (again) for a better place to do it - though I won't want to touch > all the architectures for it. I'll post later today. I could put #ifdef CONFIG_MMU around the SetPageUptodate(ZERO_PAGE(0)) added to pagecache_init(); or if that's considered distasteful, I could skip making it potentially useful to other filesystems, revert the change to pagecache_init(), and just do it in mm/shmem.c's CONFIG_SHMEM (hence CONFIG_MMU) instance of shmem_init(). But I wonder if it's safe for noMMU architectures to go on without a working ZERO_PAGE(0). It has uses scattered throughout the tree, in drivers, fs, crypto and more, and it's not at all obvious (to me) that they all depend on CONFIG_MMU. Some might cause (unreported) crashes, some might use an unzeroed page in place of a pageful of zeroes. arm noMMU and h8300 noMMU and m68k noMMU each has #define ZERO_PAGE(vaddr) (virt_to_page(0)) which seems riskily wrong to me. h8300 and m68k actually go to the trouble of allocating an empty_zero_page for this, but then forget to link it up to the ZERO_PAGE(vaddr) definition, which is what all the common code uses. arm noMMU does not presently allocate such a page; and I do not feel entitled to steal a page from arm noMMU platforms, for a hypothetical case, without agreement. But here's an unbuilt and untested patch for consideration - which of course should be split in three if agreed (and perhaps the h8300 part quietly forgotten if h8300 is already on its way out). (Yes, arm uses empty_zero_page in a different way from all the other architectures; but that's okay, and I think arm's way, with virt_to_page() already baked in, is better than the others; but I've no wish to get into changing them.) Patrice, does this patch build and run for you? I have no appreciation of arm early startup issues, and may have got it horribly wrong. Thanks, Hugh arch/arm/include/asm/pgtable-nommu.h | 3 ++- arch/arm/mm/nommu.c | 16 ++++++++++++++++ arch/h8300/include/asm/pgtable.h | 6 +++++- arch/h8300/mm/init.c | 5 +++-- arch/m68k/include/asm/pgtable_no.h | 5 ++++- 5 files changed, 30 insertions(+), 5 deletions(-) --- a/arch/arm/include/asm/pgtable-nommu.h +++ b/arch/arm/include/asm/pgtable-nommu.h @@ -48,7 +48,8 @@ typedef pte_t *pte_addr_t; * ZERO_PAGE is a global shared page that is always zero: used * for zero-mapped memory areas etc.. */ -#define ZERO_PAGE(vaddr) (virt_to_page(0)) +extern struct page *empty_zero_page; +#define ZERO_PAGE(vaddr) (empty_zero_page) /* * Mark the prot value as uncacheable and unbufferable. --- a/arch/arm/mm/nommu.c +++ b/arch/arm/mm/nommu.c @@ -24,6 +24,13 @@ #include "mm.h" +/* + * empty_zero_page is a special page that is used for + * zero-initialized data and COW. + */ +struct page *empty_zero_page; +EXPORT_SYMBOL(empty_zero_page); + unsigned long vectors_base; #ifdef CONFIG_ARM_MPU @@ -148,9 +155,18 @@ void __init adjust_lowmem_bounds(void) */ void __init paging_init(const struct machine_desc *mdesc) { + void *zero_page; + early_trap_init((void *)vectors_base); mpu_setup(); bootmem_init(); + + zero_page = memblock_alloc(PAGE_SIZE, PAGE_SIZE); + if (!zero_page) + panic("%s: Failed to allocate %lu bytes align=0x%lx\n", + __func__, PAGE_SIZE, PAGE_SIZE); + empty_zero_page = virt_to_page(zero_page); + flush_dcache_page(empty_zero_page); } /* --- a/arch/h8300/include/asm/pgtable.h +++ b/arch/h8300/include/asm/pgtable.h @@ -19,11 +19,15 @@ extern void paging_init(void); static inline int pte_file(pte_t pte) { return 0; } #define swapper_pg_dir ((pgd_t *) 0) + +/* zero page used for uninitialized stuff */ +extern void *empty_zero_page; + /* * ZERO_PAGE is a global shared page that is always zero: used * for zero-mapped memory areas etc.. */ -#define ZERO_PAGE(vaddr) (virt_to_page(0)) +#define ZERO_PAGE(vaddr) (virt_to_page(empty_zero_page)) /* * These would be in other places but having them here reduces the diffs. --- a/arch/h8300/mm/init.c +++ b/arch/h8300/mm/init.c @@ -41,7 +41,8 @@ * ZERO_PAGE is a special page that is used for zero-initialized * data and COW. */ -unsigned long empty_zero_page; +void *empty_zero_page; +EXPORT_SYMBOL(empty_zero_page); /* * paging_init() continues the virtual memory environment setup which @@ -65,7 +66,7 @@ void __init paging_init(void) * Initialize the bad page table and bad page to point * to a couple of allocated pages. */ - empty_zero_page = (unsigned long)memblock_alloc(PAGE_SIZE, PAGE_SIZE); + empty_zero_page = memblock_alloc(PAGE_SIZE, PAGE_SIZE); if (!empty_zero_page) panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__, PAGE_SIZE, PAGE_SIZE); --- a/arch/m68k/include/asm/pgtable_no.h +++ b/arch/m68k/include/asm/pgtable_no.h @@ -38,11 +38,14 @@ extern void paging_init(void); #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) }) #define __swp_entry_to_pte(x) ((pte_t) { (x).val }) +/* zero page used for uninitialized stuff */ +extern void *empty_zero_page; + /* * ZERO_PAGE is a global shared page that is always zero: used * for zero-mapped memory areas etc.. */ -#define ZERO_PAGE(vaddr) (virt_to_page(0)) +#define ZERO_PAGE(vaddr) (virt_to_page(empty_zero_page)) /* * All 32bit addresses are effectively valid for vmalloc...