Am 17.07.2021 um 17:41 schrieb Michael Schmitz:
Am 16.07.2021 um 14:03 schrieb Michael Schmitz:
Hi Christoph,
On 15/07/21 7:26 am, Michael Schmitz wrote:
I've got a vague recollection that I've seen weird crashes in the past
related to temperature extremes (we've had a few unusually cold days
in our parts just now), so I've gone back to a kernel from the switch
stack / refactoring exit tests (which ran the stress tests fine
earlier) to rule that one out. Looking good so far, so I begin to
wonder whether we need to introduce get_fc() and use that to restore
the original sfc/dfc instead of assuming USER_DATA is always correct?
No crashes with the known good kernel after over a day of stress testing
- I'll try Andreas' patch once the current test run has completed.
One thing I noticed with either your final or your v2 patch series - as
far as the tests ran at all, run times were 30% increased. That's a lot
With Andreas' patch applied, the run time increase is now less severe
(11-13%). I'll repeat that a few more times but it's looking a lot
better so far. No instruction format errors seen anymore.
Alas - got another one:
[124760.720000] *** FORMAT ERROR *** FORMAT=0
[124760.740000] Current process id is 1108
[124760.750000] BAD KERNEL TRAP: 00000000
[124760.770000] Modules linked in: atari_scsi ne 8390p
[124760.800000] PC: [<00002a8c>] resume_userspace+0x14/0x16
[124760.820000] SR: 2200 SP: 96ae8faf a2: efc67932
[124760.850000] d0: 00000047 d1: 0000000a d2: 00000001 d3: 80007c40
[124760.880000] d4: 00000000 d5: 80004326 a0: 80009d78 a1: efc678f4
[124760.890000] Process syslogd (pid: 1108, task=742727e0)
[124760.920000] Frame format=0
[124760.930000] Stack from 00929fa4:
[124760.930000] 02048000 252cb008 0eee0749 660000c2 00929e18
00000000 00000001 0003469a
[124760.930000] 00033fd8 0002f7a6 00000006 00000000 00000001
0003469a 00513d00 00ad320c
[124760.930000] 00929e2c 0003e514 006616e0 00000003 00000000
002c6f0a 00035946
[124761.030000] Call Trace: [<0003469a>] get_work_pool+0x0/0x38
[124761.060000] [<00033fd8>] find_worker_executing_work+0x0/0x40
[124761.090000] [<0002f7a6>] sys_rt_sigprocmask+0x5a/0x9a
[124761.100000] [<0003469a>] get_work_pool+0x0/0x38
[124761.120000] [<0003e514>] wake_up_process+0x12/0x16
[124761.150000] [<002c6f0a>] printk+0x0/0x18
[124761.170000] [<00035946>] __queue_work+0x1a8/0x1be
[124761.190000]
[124761.200000] Code: 1029 0007 660c 4cdf 073e 201f 588f dfdf <4e73>
254f 040c e308 660a 487a ffe0 60ff 002d 26ae 598f 48e7 031e 486f 001c 61ff
The faulting instruction is the 'rte' at the end of resume_userspace
from our entry.S.
Any idea what's gone wrong this time, Andreas?
(All processes except probably syslogd kept running and my tests
completed OK - rerunning that again now to see what else I get...
Cheers,
Michael