Re: [PATCH] parisc: Try to fix random segmentation faults in package builds

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024-06-04 13:08, John David Anglin wrote:
On 2024-06-04 11:07 a.m., matoro wrote:
Thanks a ton Dave, I've applied this on top of 6.9.2 and also think I'm seeing improvement!  No panics yet, I have a couple week's worth of package testing to catch up on so I'll report if I see anything!

I've seen a few warnings in my dmesg while testing, although I didn't see any immediately corresponding failures.  Any danger?
We have determined most of the warnings arise from pages that have been swapped out.  Mostly, it seems these pages have been flushed to memory before the pte is changed to a swap pte.  There might be issues for pages that have been cleared.  It is possible the random faults aren't related to the warning I added for pages with an invalid pfn in flush_cache_page_if_present.  The only thing I know for certain is there is no way to flush these pages on parisc
other than flushing the whole cache.

My c8000 has run almost two weeks without any random faults.  On the other hand, Helge has two machines that
frequently fault and generate these warnings.

Flushing the whole cache in flush_cache_mm and flush_cache_range might eliminate the random faults but
there will be a significant performance hit.

Dave

Unfortunately I had a few of these faults trip today after ~4 days of uptime with corresponding random segfaults. One of the WARNs was emitted shortly before, though not for the same PID. Reattempted the build twice and randomly segfaulted all 3 times. Had to reboot as usual to get it out of the bad state.

[Mon Jun 10 14:26:20 2024] ------------[ cut here ]------------
[Mon Jun 10 14:26:20 2024] WARNING: CPU: 1 PID: 26453 at arch/parisc/kernel/cache.c:624 flush_cache_page_if_present+0x1a4/0x330 [Mon Jun 10 14:26:20 2024] Modules linked in: nfnetlink af_packet overlay loop nfsv4 dns_resolver nfs lockd grace sunrpc netfs autofs4 binfmt_m isc sr_mod ohci_pci cdrom ehci_pci ohci_hcd ehci_hcd tg3 usbcore pata_cmd64x ipmi_si hwmon usb_common ipmi_devintf libata libphy nls_base ipmi_
msghandler
[Mon Jun 10 14:26:20 2024] CPU: 1 PID: 26453 Comm: ld.so.1 Tainted: G W 6.9.3-gentoo-parisc64 #1
[Mon Jun 10 14:26:20 2024] Hardware name: 9000/800/rp3440

[Mon Jun 10 14:26:20 2024]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[Mon Jun 10 14:26:20 2024] PSW: 00001000000001001111100100001111 Tainted: G W [Mon Jun 10 14:26:20 2024] r00-03 000000ff0804f90f 000000004106b280 00000000402090bc 000000007f4c85f0 [Mon Jun 10 14:26:20 2024] r04-07 0000000040f99a80 00000000f855d000 00000000561b6360 000000000800000f [Mon Jun 10 14:26:20 2024] r08-11 0000000c009674de 0000000000000000 0000004100b2e39c 000000007f4c81c0 [Mon Jun 10 14:26:20 2024] r12-15 00000000561b6360 0000004100b2e330 0000000000000002 0000000000000000 [Mon Jun 10 14:26:20 2024] r16-19 0000000040f50360 fffffffffffffff4 000000007f4c8108 0000000000000003 [Mon Jun 10 14:26:20 2024] r20-23 0000000000001a46 0000000011b81000 ffffffffc0000000 00000000f859d000 [Mon Jun 10 14:26:20 2024] r24-27 0000000000000000 000000000800000f 0000004100b2e3a0 0000000040f99a80 [Mon Jun 10 14:26:20 2024] r28-31 0000000000000000 000000007f4c8670 000000007f4c86a0 0000000000000000 [Mon Jun 10 14:26:20 2024] sr00-03 000000000604d000 000000000604d000 0000000000000000 000000000604d000 [Mon Jun 10 14:26:20 2024] sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000

[Mon Jun 10 14:26:20 2024] IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000040209104 0000000040209108 [Mon Jun 10 14:26:20 2024] IIR: 03ffe01f ISR: 0000000000000000 IOR: 0000000000000000 [Mon Jun 10 14:26:20 2024] CPU: 1 CR30: 00000001e700e780 CR31: fffffff0f0e05ee0
[Mon Jun 10 14:26:20 2024]  ORIG_R28: 00000000414cab90
[Mon Jun 10 14:26:20 2024]  IAOQ[0]: flush_cache_page_if_present+0x1a4/0x330
[Mon Jun 10 14:26:20 2024]  IAOQ[1]: flush_cache_page_if_present+0x1a8/0x330
[Mon Jun 10 14:26:20 2024]  RP(r2): flush_cache_page_if_present+0x15c/0x330
[Mon Jun 10 14:26:20 2024] Backtrace:
[Mon Jun 10 14:26:20 2024] [<000000004020b110>] flush_cache_range+0x138/0x158 [Mon Jun 10 14:26:20 2024] [<00000000405fdfc8>] change_protection+0x134/0xb78
[Mon Jun 10 14:26:20 2024]  [<00000000405feb4c>] mprotect_fixup+0x140/0x478
[Mon Jun 10 14:26:20 2024] [<00000000405ff15c>] do_mprotect_pkey.constprop.0+0x2d8/0x5f0
[Mon Jun 10 14:26:20 2024]  [<00000000405ff4a4>] sys_mprotect+0x30/0x60
[Mon Jun 10 14:26:20 2024]  [<0000000040203fbc>] syscall_exit+0x0/0x10

[Mon Jun 10 14:26:20 2024] ---[ end trace 0000000000000000 ]---

[Mon Jun 10 14:28:04 2024] do_page_fault() command='ld.so.1' type=15 address=0x161236a0 in libc.so[f8b9c000+1b6000] trap #15: Data TLB miss fault, vm_start = 0x4208e000, vm_end = 0x420af000 [Mon Jun 10 14:28:04 2024] CPU: 0 PID: 26681 Comm: ld.so.1 Tainted: G W 6.9.3-gentoo-parisc64 #1
[Mon Jun 10 14:28:04 2024] Hardware name: 9000/800/rp3440

[Mon Jun 10 14:28:04 2024]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[Mon Jun 10 14:28:04 2024] PSW: 00000000000001100000000000001111 Tainted: G W [Mon Jun 10 14:28:04 2024] r00-03 000000000006000f 00000000f8d584a8 00000000f8c46e33 0000000000000028 [Mon Jun 10 14:28:04 2024] r04-07 00000000f8d54660 00000000f8d54648 0000000000000020 000000000001ab91 [Mon Jun 10 14:28:04 2024] r08-11 00000000f8d54654 00000000f8d5bf78 0000000000000005 00000000f9ad87c8 [Mon Jun 10 14:28:04 2024] r12-15 0000000000000000 0000000000000000 000000000000003f 00000000000003e9 [Mon Jun 10 14:28:04 2024] r16-19 000000000001a000 000000000001a000 000000000001a000 00000000f8d56ca8 [Mon Jun 10 14:28:04 2024] r20-23 0000000000000000 00000000f8c46bcc 000000000001a2d8 00000000ffffffff [Mon Jun 10 14:28:04 2024] r24-27 0000000000000000 0000000000000020 00000000f8d54648 000000000001a000 [Mon Jun 10 14:28:04 2024] r28-31 0000000000000001 0000000016123698 00000000f9ad8cc0 00000000f9ad8c2c [Mon Jun 10 14:28:04 2024] sr00-03 0000000006069400 0000000006069400 0000000000000000 0000000006069400 [Mon Jun 10 14:28:04 2024] sr04-07 0000000006069400 0000000006069400 0000000006069400 0000000006069400

[Mon Jun 10 14:28:04 2024]       VZOUICununcqcqcqcqcqcrmunTDVZOUI
[Mon Jun 10 14:28:04 2024] FPSR: 00000000000000000000000000000000
[Mon Jun 10 14:28:04 2024] FPER1: 00000000
[Mon Jun 10 14:28:04 2024] fr00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [Mon Jun 10 14:28:04 2024] fr04-07 3fbc58dcd6e825cf 41d98fdb92c00000 00001d29b5e9bfb4 41d999952df718f9 [Mon Jun 10 14:28:04 2024] fr08-11 ffe3d998c543273c ff60537aba025d00 004698b61bd9b9ee 000527c1bed53af7 [Mon Jun 10 14:28:04 2024] fr12-15 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [Mon Jun 10 14:28:04 2024] fr16-19 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [Mon Jun 10 14:28:04 2024] fr20-23 0000000000000000 0000000000000000 0000000000000020 0000000000000000 [Mon Jun 10 14:28:04 2024] fr24-27 0000000000000003 0000000000000000 3d473181aed58d64 bff0000000000000 [Mon Jun 10 14:28:04 2024] fr28-31 3fc999b324f10111 057028cc5c564e70 dbc91a3f6bd13476 02632fb493c76730

[Mon Jun 10 14:28:04 2024] IASQ: 0000000006069400 0000000006069400 IAOQ: 00000000f8c44063 00000000f8c44067 [Mon Jun 10 14:28:04 2024] IIR: 0fb0109c ISR: 0000000006069400 IOR: 00000000161236a0 [Mon Jun 10 14:28:04 2024] CPU: 0 CR30: 00000001e70099e0 CR31: fffffff0f0e05ee0
[Mon Jun 10 14:28:04 2024]  ORIG_R28: 0000000000000000
[Mon Jun 10 14:28:04 2024]  IAOQ[0]: 00000000f8c44063
[Mon Jun 10 14:28:04 2024]  IAOQ[1]: 00000000f8c44067
[Mon Jun 10 14:28:04 2024]  RP(r2): 00000000f8c46e33




[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux