On Thu, Aug 18, 2022 at 07:14:12PM +0200, Max Schulze wrote: > > On 15.08.22 16:22, Will Deacon wrote: > >>> [20:47:09] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > >>> [20:48:46] BUG: Bad page map in process projecta pte:1110111111111111 pmd:800000001c40003 > >>> [20:48:46] addr:0000007fa1c00000 vm_flags:00100073 anon_vma:ffffff805bf80d08 mapping:0000000000000000 index:7fa1c00 > >>> [20:48:46] file:(null) fault:0x0 mmap:0x0 read_folio:0x0 > > > > >> I hate to say it, but this all looks like memory corruption hitting the > >> page table and possibly the 'struct page' array to me :/ > > > > Perhaps a note on the occcurence: across devices, the "bad page map" > > differs at pte, but somehow is mostly consistent at pmd:800000001c40003 > > (though I have seen 800000001c02003 and 800000001c40003). Is this some > > "magic value"? Because when not, I think it would be highly unlikely to > > be the hardware. > > > > It is not only my program that has the problem, I have seen > > > > [Sun Aug 14 17:30:38 2022] BUG: Bad page map in process llvmpipe-3 pte:262d2626292a2627 pmd:800000001c01003 > > > > and > > [Sat Aug 13 11:53:43 2022] BUG: Bad page map in process Xorg:disk$1 pte:a098a09aa29ea8a4 pmd:800000001c01003 > > [Sat Aug 13 11:53:43 2022] addr:00000055a961e000 vm_flags:200100073 anon_vma:ffffff804c07d8f8 mapping:0000000000000000 index:55a961e > > [Sat Aug 13 11:53:43 2022] file:(null) fault:0x0 mmap:0x0 read_folio:0x0 > > > > > [..] > > I am able to reproduce this on 6.0.0-rc1 . > It looks like vm_normal_page does not recognize the page as being "normal" (?). > (mm/memory.c) I think the issue is much more fundamental than that; you appear to have page-table corruption (for example, "pte:262d2626292a2627" and "pte:1110111111111111" are definitely corrupted) and so anything dealing with 'struct page' derived from the physical address in the pte is going to go wonky. >From the logs here, the pmds look ok but these are the pte values I spotted: 0x1110111111111111 0x262d2626292a2627 0xa098a09aa29ea8a4 0x212725231f242323 0x2626262023222323 which don't seem to correspond to any sort of poison, but are possibly artifacts of repeated patterns with random bits cleared? Will