(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Thu, 01 Aug 2019 22:34:16 +0000 bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=204407 > > Bug ID: 204407 > Summary: Bad page state in process Xorg > Product: Memory Management > Version: 2.5 > Kernel Version: 5.3.0-rc2-00013 > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Page Allocator > Assignee: akpm@xxxxxxxxxxxxxxxxxxxx > Reporter: petr@xxxxxxxxxxxxxx > Regression: No > > Created attachment 284081 > --> https://bugzilla.kernel.org/attachment.cgi?id=284081&action=edit > dmesg > > I've upgraded from 5.3-rc1 to 5.3-rc2, and when I started X server, system > became unhappy: > > [259701.387365] BUG: Bad page state in process Xorg pfn:2a300 > [259701.393593] page:ffffea0000a8c000 refcount:0 mapcount:-128 > mapping:0000000000000000 index:0x0 > [259701.402832] flags: 0x2000000000000000() > [259701.407426] raw: 2000000000000000 ffffffff822ab778 ffffea0000a8f208 > 0000000000000000 > [259701.415900] raw: 0000000000000000 0000000000000003 00000000ffffff7f > 0000000000000000 > [259701.424373] page dumped because: nonzero mapcount > [259701.429847] Modules linked in: af_packet xt_REDIRECT nft_compat x_tables > nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv4 nf_tables > nfnetlink ppdev parport fuse autofs4 binfmt_misc uinput twofish_generic > twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common > camellia_generic camellia_aesni_avx_x86_64 camellia_x86_64 serpent_avx_x86_64 > serpent_sse2_x86_64 serpent_generic blowfish_generic blowfish_x86_64 > blowfish_common cast5_avx_x86_64 cast5_generic cast_common des_generic cmac > xcbc rmd160 af_key xfrm_algo rpcsec_gss_krb5 nfsv4 nls_iso8859_2 cifs libarc4 > nfsv3 nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ipv6 crc_ccitt > nf_defrag_ipv6 snd_hda_codec_hdmi pktcdvd coretemp hwmon intel_rapl_common > iosf_mbi x86_pkg_temp_thermal snd_hda_codec_realtek snd_hda_codec_generic > ledtrig_audio snd_hda_intel snd_hda_codec crct10dif_pclmul crc32_pclmul > snd_hwdep crc32c_intel snd_hda_core ghash_clmulni_intel snd_pcm_oss uas > iTCO_wdt aesni_intel e1000e > [259701.429873] snd_mixer_oss iTCO_vendor_support aes_x86_64 snd_pcm > crypto_simd ptp mei_me dcdbas lpc_ich sr_mod cryptd usb_storage snd_timer > glue_helper mfd_core input_leds pps_core tpm_tis cdrom i2c_i801 snd mei > tpm_tis_core sg tpm [last unloaded: parport_pc] > [259701.539387] CPU: 10 PID: 4860 Comm: Xorg Tainted: G T > 5.3.0-rc2-64-00013-g03f05a670a3d #69 > [259701.549382] Hardware name: Dell Inc. Precision T3610/09M8Y8, BIOS A16 > 02/05/2018 > [259701.549382] Call Trace: > [259701.549382] dump_stack+0x46/0x60 > [259701.549382] bad_page.cold.28+0x81/0xb4 > [259701.549382] __free_pages_ok+0x236/0x240 > [259701.549382] __ttm_dma_free_page+0x2f/0x40 > [259701.549382] ttm_dma_unpopulate+0x29b/0x370 > [259701.549382] ttm_tt_destroy.part.6+0x44/0x50 > [259701.549382] ttm_bo_cleanup_memtype_use+0x29/0x70 > [259701.549382] ttm_bo_put+0x225/0x280 > [259701.549382] ttm_bo_vm_close+0x10/0x20 > [259701.549382] remove_vma+0x20/0x40 > [259701.549382] __do_munmap+0x2da/0x420 > [259701.549382] __vm_munmap+0x66/0xc0 > [259701.549382] __x64_sys_munmap+0x22/0x30 > [259701.549382] do_syscall_64+0x5e/0x1a0 > [259701.549382] ? prepare_exit_to_usermode+0x75/0xa0 > [259701.549382] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [259701.549382] RIP: 0033:0x7f504d0ec1d7 > [259701.549382] Code: 10 e9 67 ff ff ff 0f 1f 44 00 00 48 8b 15 b1 6c 0c 00 f7 > d8 64 89 02 48 c7 c0 ff ff ff ff e9 6b ff ff ff b8 0b 00 00 00 0f 05 <48> 3d 01 > f0 ff ff 73 01 c3 48 8b 0d 89 6c 0c 00 f7 d8 64 89 01 48 > [259701.549382] RSP: 002b:00007ffe529db138 EFLAGS: 00000206 ORIG_RAX: > 000000000000000b > [259701.549382] RAX: ffffffffffffffda RBX: 0000564a5eabce70 RCX: > 00007f504d0ec1d7 > [259701.549382] RDX: 00007ffe529db140 RSI: 0000000000400000 RDI: > 00007f5044b65000 > [259701.549382] RBP: 0000564a5eafe460 R08: 000000000000000b R09: > 000000010283e000 > [259701.549382] R10: 0000000000000001 R11: 0000000000000206 R12: > 0000564a5e475b08 > [259701.549382] R13: 0000564a5e475c80 R14: 00007ffe529db190 R15: > 0000000000000c80 > [259701.707238] Disabling lock debugging due to kernel taint I assume the above is misbehaviour in the DRM code? > > Also - maybe related, maybe not - I've got three userspace crashes earlier on > this kernel (but never before): > > [77154.886836] iscons.py[12441]: segfault at 2c ip 00000000080cf0b5 sp > 00000000f773fb60 error 4 in python[8048000+11a000] > [77154.898376] Code: 02 0f 84 4a 2e 00 00 8b 4d 08 8b bd 04 ff ff ff 8b 59 38 > 8b 57 20 8b 7b 10 85 ff 0f 84 ee 22 00 00 8b 8d 04 ff ff ff 8b 59 08 <8b> 43 2c > 85 c0 0f 84 3c e3 ff ff 8b 51 34 8b 71 38 8b 79 3c ff 00 > [119529.983163] in.telnetd[616]: segfault at 0 ip 0000555fdfa09a05 sp > 00007ffd8fc05380 error 4 in in.telnetd[555fdfa06000+b000] > [119529.995783] Code: 8d 3d c2 76 00 00 e8 9a 2b 00 00 e9 25 fe ff ff be f2 00 > 00 00 48 8d 3d ac 76 00 00 e8 84 2b 00 00 eb 96 48 8b 05 63 ee 00 00 <0f> b6 00 > e9 fa fe ff ff 44 89 ee 48 8d 3d 8c 76 00 00 e8 64 2b 00 > [120884.183003] iscons.py[10779]: segfault at 2c ip 00000000080d0dc2 sp > 00000000f7702b60 error 4 in python[8048000+11a000] > [120884.195182] Code: 4c 24 08 89 44 24 04 89 3c 24 e8 d9 0b 01 00 8b 55 f0 8b > 7d e8 8b 4d ec 8b 85 04 ff ff ff 89 55 84 89 7d 8c 89 4d 88 8b 50 08 <8b> 7a 2c > 85 ff 0f 84 3c 10 00 00 8b bd 04 ff ff ff 89 f8 8b 57 34 > > I've investigated in.telnetd in detail as I was worried if there is some 0-day > being used on my system - and as far as I could tell, problem is that part of > the .bss turned to zeroes after process did fork/exec - there was NULL in the > variable that cannot have NULL as variable is set to non-NULL value during > in.telnetd initialization. > > iscons crashes are from NULL pointer dereference too, and iscons does lot of > fork/exec as well. hm, that does sound unrelated.