Hi Linus,
On 24/08/21 05:59, Linus Torvalds wrote:
On Sun, Aug 22, 2021 at 12:34 PM Michael Schmitz <schmitzmic@xxxxxxxxx> wrote:
Got this overnight:
[536154.200000] *** FORMAT ERROR *** FORMAT=0
[536154.210000] Current process id is 4656
[536154.230000] BAD KERNEL TRAP: 00000000
[536154.240000] Modules linked in: atari_scsi ne 8390p [last unloaded: atari_scsi]
[536154.260000] PC: [<00002a8c>] resume_userspace+0x14/0x16
[536154.270000] SR: 2208 SP: 977bd1be a2: 8009b5e8
[536154.290000] d0: 8009b5e8 d1: cfcfcfcf d2: 00000000 d3: ffffffff
[536154.300000] d4: 00000000 d5: 00000000 a0: 8008a108 a1: 8009b7df
[536154.320000] Process savelog (pid: 4656, task=e49aa246)
[536154.330000] Frame format=0
[536154.340000] Stack from 00cc5fa4:
[536154.340000] 02088004 3666b008 1c0eb209 007eb5e8 8006a2d0 efaec378 8004366c 61ff61ff
[536154.340000] 8006a2d4 8006a2d2 00000000 030dfffb 0044fffa 0e000000 fffa1a00 fffa1c00
[536154.340000] fffa1e00 fffb0e40 fffb0e80 00049b66 00000040 005f5800 00000001
Strange. If I read that stack frame correctly, that seems to be an
exception frame of type 0xb ("Long Bus Cycle").
Not sure where you get the 0xb from - the frame format I see is 0. 0xb
would print additional information before the stack dump. Format 0
doesn't appear in the stack frame struct definition in asm/traps.h.
I have no 68k processor manual at hand, so no idea whether frame format
0 even exists.
Plus the frame content is then apparently corrupted enough that the
rte causes an exception on trying to restore it.
None of which makes sense or seems to have much at all to do with any
of these patches. Yes, we mess with the exception frame, but only for
fork(), and while "copy_process()" doesn't set any frame type, I see
only two cases:
- the kernel thread one does a "memset()" to clear it, so you should
end up with frame type 0
I need to reread your description of how the kernel thread creation
works - ret_from_kernel_thread() uses the same return path that
resume_userspace() appears on, so we might end up here through that
path. I've an idea how to test this, but that might be a little acacemic...
- the user thread case copies the original frame format (which I
think is just the system call frame from the TRAP instruction).
Are you 100% sure your hardware is stable?
I've always thought so. But going back through quite a few years of
console log dumps, I now see that this format error has happened as
early as 2017 (kernel 4.10.0-rc2). So it appears this is not a
regression caused by Christoph's patches after all.
Sorry for causing all this confusion...
Cheers,
Michael