Hi Joachim,
On 25/7/19 6:26 am, Joachim Dietrich wrote:
There are no separate supervisor and user stack pointers on older
Coldfire CPUs such as the mcf5329.
We, therefore, must enable the use of software copies which is
done by
selecting CONFIG_COLDFIRE_SW_A7, else the first user process has a
wrong
stack pointer.
Signed-off-by: Joachim Dietrich <jo.dietrich@xxxxxx>
Signed-off-by: Geert Uytterhoeven <geert@xxxxxxxxxxxxxx>
---
I (Geert) am not that super familiar with the
CONFIG_COLDFIRE_SW_A7
handling. According to Section 3.2.3 ("Supervisor/User Stack
Pointers
(A7 and OTHER_A7)") of MCF5329RM.pdf[1], mcf5329 does have
separate
stack pointers, but they're not USP and SSP, like on classic m68k
and
newer Coldfire parts.
Yes, I know he's right. So my description is probably not correct.
But
I'm a little bit confused about the stack pointer handling of
the v3
The stack pointer handling (or really the presence of A7 and
other-A7)
can't be determined only by knowing the version core. There are v3
cores that don't have both A7 registers and some that do.
coldfire, because in the reference manual of the mcf5329 stands:
"To
support dual stack pointers, the following two supervisor
instructions
are included in the ColdFire instruction set architecture to
load/store
the USP:
move.l Ay,USP;move to USP
move.l USP,Ax;move from USP"
And that's what the CONFIG_COLDFIRE_SW_A7 code does in entry.h.
Furthermore, in earlier versions of uclinux, e.g kernel 2.6.26,
this
was
the default handling for the mcf5329.
That was certainly true of older kernels. I added support for using
both A7 registers in later kernels (I don't remember the exact
version
I included it). The addition of 2 A7 registers and supporting
instructions
was introduced in the ColdFire ISA_A+ version of the instruction
set.
(So generally speaking old ColdFire parts don't have them, newer
ones
do).
That support introduced the CONFIG_COLDFIRE_SW_A7 define.
Hence mcf5329 differs from e.g. mcf5206[2], which has a single
unified
stack pointer, which is what CONFIG_COLDFIRE_SW_A7 is designed
for.
I don't know if this applies to all mcf532x variants.
It's quite possible CONFIG_COLDFIRE_SW_A7 is the correct solution
for
mcf5329, or perhaps it needs some special handling for A7 vs.
OTHER_A7?
Perhaps there's a better or more correct handling for the stack
pointers, but without CONFIG_COLDFIRE_SW_A7 my kernel (4.19.15)
fails at
rdusp() and wdusp() in processor.h and my first user process has a
wrong sp.
The 5329 supports ColdFire ISA_A+ so it definitely has the A7
and other-A7 support. And is is implemented the same on all ColdFire
parts that support it.
The trick with the dual A7 support is that you have to enable it
in the Cache Control Register (CACR), the EUSP bit. Otherwise you
get traps on those move to and from USP instructions - like what
you are seeing.
So my guess is that CACR is not being setup properly. It is set via
the startup code in arch/m68k/coldfire/head.S - based on whether
CONFIG_COLDFIRE_SW_A7 is defined (see
arch/m68k/include/asm/m53xxacr.h).
Can you double check that the CACR register is being set with
the EUSP bit (bit 4) set?
OK. I will do that. But at first glance everything looks right (in
the
code). I'm afraid I'm going to need more analysis on the behavior.
I'll get back to you.
Looking into this further I think I can see at least one problem -
directly related to v3 core cache handling. And this could be
what you are seeing.
The CACR register also has bits to invalidate and flush the caches.
And we use that in the cache control functions in
arch/m68k/include/asm/cacheflush_no.h
For the v3 cores with the definitions set the way they are we
will be over-writing the CACR value - loosing the EUSP bit setting.
Can you try the patch below?
I did it,
It will maintain the CACR value while flushing caches.
The most widely used v3 core, the 5307, won't be effected since it
does not support separate A7 registers. The handling of this for
other version cores (v2 and v4) does it correctly already.
but for me it looks as if the patch has fixed only the described
problem, but the handling doesn't seem to work quite yet. But let me
explained this in more detail:
Without your patch, the kernel starts until the first user process. Let
this be a simple "Hello World", which prints the argv[0] and argv. In
this case usp points to a wrong address; the user process has no
arguments. No chance to start an init process.
With the new patch, the (same "Hello World") usp seems to points to a
correct address. The arguments are correct. So, I guess the EUSP Bit in
CACR is set.
But when I want to start the BusyBox init then I get a trap:
---
...
[2.780000] This architecture does not have kernel memory protection.
[2.780000] Run /sbin/init as init process
init started: BusyBox v1.29.3 (2019-06-04 13:30:06 CEST)
[3.310000] softirq: huh, entered softirq 3 NET_RX (ptrval) with
preempt_count 000001fa, exited with 000001f9?
[3.320000] softirq: huh, entered softirq 3 NET_RX (ptrval) with
preempt_count 00000100, exited with 000000f9?
[3.330000] *** ILLEGAL INSTRUCTION *** FORMAT=4
[3.330000] Current process id is 80347136
[3.330000] BAD KERNEL TRAP: 00000000
[3.330000] Modules linked in:
[3.330000] PC: [<00000000>] (null)
[3.330000] SR: 2708 SP: (ptrval) a2: fffffff4
[3.330000] d0: ffffffff d1: ffffffff d2: 00000000 d3: 00000000
[3.330000] d4: 00000001 d5: 00000003 a0: 00000000 a1: 4099f8bc
[3.330000] Process (null) (pid: 80347136, task= (null))
[3.330000] Frame format=4 eff addr=4004c6a0 pc=fffffff3
[3.330000] Stack from 4099f71c:
[3.330000] 00000003 000] 00002784 4099e000 4004c620
0000031e 4004cc78 04ca0504 00000003 00000001
[3.330000] 00000000 00000000 4099f78c 00000004 00000000
00000003
00000003 00000003
[3.330000] 00000000 4099f83c 00000000 0000031e 00000000
00000000
00000000 4099f798
[3.330000] 4099f798 4004cd1e 04ca0504 00000003 00000001
00000000
00000000 4003092a
[3.330000] 04ca0504 00000003 00000001 00000000 00000004
00002700
00000003 00000001
...
---
Maybe this is another problem, but as I said, with the software usp
everything works fine.
Ok, thanks for trying that. Can you send me the full kernel boot trace
(with failed trap above). And can you send the kernel .config file.
Attached you can find the bootlog as textfile, the kernel config and,
only for completeness, the board specific patches. But don't be confused
about the preempt-rt patch. I got the same behavoir (and log) even
without the rt patch.
Thanks for sending those through.
Analyzing the generated code I can see that there is at least
one problem with the last patch I sent. It is not disabling
the cache at init time. Attached is an updated patch - though
I don't think it is going to fix the problem.
Still thinking about it...
OK. Thanks for your help!
I will try the patch after i get back from my vacation on August, 5th.
In addition, I will analyse the described behavior or rather the problem
more deeply to get more information about the cause.
I'll get back to you then.
No worries, thanks.
I have quite a few ColdFire boards with various parts, but I
don't have a 5329. So I can't debug directly.
Regards
Greg