On 10/8/19 10:35 AM, Kefeng Wang wrote: > Hi Vlastimil and all, > > We met a Null pointer when do page_to_pfn() in __free_one_page() in older kernel, __nr_to_section(__sec) return NULL, > > #define __page_to_pfn(pg) \ > ({ const struct page *__pg = (pg); \ > int __sec = page_to_section(__pg); \ > (unsigned long)(__pg - __section_mem_map_addr(__nr_to_section(__sec))); \ > }) > > Before v4.11, __free_one_page() use pfn_valid_within(page_to_pfn(buddy)) to check pfn, after the Hmm, looks like the code before v4.11 was wrong. pfn_valid_(within) should be checked first, before obtaining and working with the struct page. Here we already have a struct page obtained by pointer arithmetics, and are calling page_to_pfn() on it, which means accessing its flags. The pfn_valid_within() then comes too late. > following patches, it use buddy_pfn directly. > > b4fb8f66f1ae mm, page_alloc: Add missing check for memory holes > 13ad59df67f1 mm, page_alloc: avoid page_to_pfn() when merging buddies // NOTE: directly use buddy_pfn > 76741e776a37 mm, page_alloc: don't convert pfn to idx when merging Looks like my patches fixed that code without realizing there was the bug. Commit b4fb8f66f1ae shows I've also introduced it elsewhere. > If use page_to_pfn(buddy) back in mainline 5.4-rc2, the same issue will occur, No surprise, we must validate pfn first before touching the page. > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index c0b2e0306720..fbbfe8e8b4ca 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -934,7 +934,7 @@ static inline void __free_one_page(struct page *page, > buddy_pfn = __find_buddy_pfn(pfn, order); > buddy = page + (buddy_pfn - pfn); > > - if (!pfn_valid_within(buddy_pfn)) > + if (!pfn_valid_within(page_to_pfn(buddy))) > goto done_merging; > if (!page_is_buddy(page, buddy, order)) > goto done_merging; > > It shows the buddy->flags is wrong, that is, buddy is bad, we find the buddy by page + (buddy_pfn - pfn), > so there is some issue in __find_buddy_pfn(pfn, order)? No, result of __find_buddy_pfn has to be validated first. > The following is the debug print and CallTrace, any comment? > > Thanks, > Kefeng > > 1) MEMBLOCK configuration: > memory size = 0x0000000036800000 reserved size = 0x0000000004ca7fbc > memory.cnt = 0x4 > memory[0x0] [0x0000000000000000-0x0000000013ffffff], 0x0000000014000000 bytes flags: 0x0 > memory[0x1] [0x000000002d600000-0x0000000033ffffff], 0x0000000006a00000 bytes flags: 0x0 > memory[0x2] [0x0000000034800000-0x00000000445fffff], 0x000000000fe00000 bytes flags: 0x0 > memory[0x3] [0x0000000044a00000-0x00000000509fffff], 0x000000000c000000 bytes flags: 0x0 > reserved.cnt = 0x6 > reserved[0x0] [0x000000002d680000-0x000000002e7c8fff], 0x0000000001149000 bytes flags: 0x0 > reserved[0x1] [0x0000000030b00000-0x000000003239cfff], 0x000000000189d000 bytes flags: 0x0 > reserved[0x2] [0x0000000032400000-0x00000000324fffff], 0x0000000000100000 bytes flags: 0x0 > reserved[0x3] [0x000000004e000000-0x000000004fffffff], 0x0000000002000000 bytes flags: 0x0 > reserved[0x4] [0x000000005083e040-0x0000000050849ffb], 0x000000000000bfbc bytes flags: 0x0 > reserved[0x5] [0x000000005084a000-0x00000000509fffff], 0x00000000001b6000 bytes flags: 0x0 These might be holes in the zones, right. > 2) CONFIG > CONFIG_SPARSEMEM_MANUAL=y > CONFIG_SPARSEMEM=y > CONFIG_HAVE_MEMORY_PRESENT=y > CONFIG_SPARSEMEM_EXTREME=y > CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y > # CONFIG_SPARSEMEM_VMEMMAP is not set Is CONFIG_HOLES_IN_ZONE enabled? Probably yes as that's arm64. > 3) debug print and CallTrace > __free_one_page , order = 9, max_order = 11, page = ffffff804e128000, buddy = ffffff804e120000, sec = 42623, mem_section = 0000000000000000 I would assume buddy is in one of the holes, but you'd have to print the pfn's to be sure. > buddy = ffffff804e120000, __sec = 42623, buddy->flags = 299fcc27aebc552f, SECTIONS_PGSHIFT = 46, SECTIONS_MASK = 3ffff > __section_mem_map_addr, section = 0000000000000000 > ------------[ cut here ]------------ > Ignoring spurious kernel translation fault at virtual address 0000000000000000 > WARNING: CPU: 1 PID: 29 at ../arch/arm64/mm/fault.c:302 __do_kernel_fault+0xb8/0x130 > Modules linked in: > CPU: 1 PID: 29 Comm: kworker/1:1 Not tainted 5.4.0-rc2+ #16 > Hardware name: linux,dummy-virt (DT) > Workqueue: events delayed_fput > pstate: 60000085 (nZCv daIf -PAN -UAO) > pc : __do_kernel_fault+0xb8/0x130 > lr : __do_kernel_fault+0xb8/0x130 > sp : ffffffc011273560 > x29: ffffffc011273560 x28: ffffff805020ce00 > x27: ffffff804e128000 x26: ffffff804e120000 > x25: 0000000000000000 x24: 0000000000000025 > x23: 0000000096000005 x22: 0000000000000025 > x21: 0000000096000005 x20: ffffffc011273650 > x19: 0000000000000000 x18: 0000000000000000 > x17: 00000000f06e9c14 x16: 0000000000000014 > x15: 0000000000000000 x14: 7320303030303030 > x13: 6c20616464726573 x12: 7420766972747561 > x11: 206661756c742061 x10: 6e736c6174696f6e > x9 : 726e656c20747261 x8 : 72696f7573206b65 > x7 : 72696e6720737075 x6 : 0000000000000000 > x5 : 0000000000000001 x4 : 0000000000000000 > x3 : 0000000000000000 x2 : 0000000000000000 > x1 : 0095a39d527dc9a0 x0 : 0000000000000000 > Call trace: > __do_kernel_fault+0xb8/0x130 > do_page_fault+0x60/0x3ec > do_translation_fault+0x40/0x78 > do_mem_abort+0x50/0xa8 > el1_da+0x20/0x94 > __free_one_page+0x1d8/0x31c > free_pcppages_bulk+0x1dc/0x258 > free_unref_page_commit.isra.114+0xb0/0xc0 > free_unref_page_list+0x144/0x198 > release_pages+0x8c/0x2bc > __pagevec_release+0x38/0x48 > pagevec_release+0x14/0x20 > shmem_undo_range+0x23c/0x49c > shmem_truncate_range+0x38/0x58 > shmem_evict_inode+0xd4/0x1e8 > evict+0xb8/0x174 > iput+0x178/0x1c0 > dentry_unlink_inode+0x120/0x124 > __dentry_kill+0x98/0x170 > dput+0x88/0x140 > __fput+0x184/0x1e8 > delayed_fput+0x40/0x54 > process_one_work+0x1a4/0x294 > worker_thread+0x1ec/0x284 > kthread+0xf0/0x100 > ret_from_fork+0x10/0x18 > ---[ end trace ebdfde5c0fdc7511 ]--- > ------------[ cut here ]------------ >