Hi Daniel,
On 13.12.18 02:00, Daniel Schwierzeck wrote:
Am 12.12.18 um 09:18 schrieb Stefan Roese:
Hi!
I've been hunting for a problem for quite some time, where Linux
hangs / crashes in userspace at some point on my MT7688 based
systems. I found that this problem can be avoided (worked around)
by not giving Linux the full memory (by using DT memory node fixup
or mem= kernel cmdline). When reducing this memory by the memory
used by U-Boot (stack pointer minus some KiB value as this is the
"lowest" memory used by U-Boot), then Linux runs just fine.
My first idea here was, that this issue is cache related (most
likely I-cache). But all tests and debugging in this area did not
fix this issue (even running with caches disabled).
Finally I found that this line in U-Boot makes Linux break:
arch/mips/lib/traps.c:
void trap_init(ulong reloc_addr)
unsigned long ebase = gd->irq_sp;
...
write_c0_ebase(ebase);
This sets EBase to something like 0x87e9b000 on my system (128MiB).
And Linux then re-uses this value and copies the exceptions handlers
to this address, overwriting random code and leading to an unstable
system.
So my questions now is, how should this be handled on the MT7688
platform instead? One way would be to set EBase back to the
original value (0x80000000) before booting into Linux. Another
solution would be to add some Linux code like board_ebase_setup()
to the MT7688 Linux port.
Since I'm still no real MIPS expert yet, I would really like to get
some advise here on how to best solve this issue. Maybe I missed
something. Comments?
Thanks,
Stefan
the relevant code is in arch/mips/kernel/traps.c:trap_init():
Within the branch if (cpu_has_veic || cpu_has_vint) the kernel will
allocate memory for the exception vectors and resets ebase to that memory.
This branch currently is not taken on this SoC (Mediatek / Ralink).
In the else branch ebase is statically assigned to CAC_BASE which should
resolve to 0x80000000 on Ralink platform. The ebase is only read from
CP0 for MIPS r6 CPUs.
Without CPU_MIPSR2_IRQ_VI being set (as its currently the case), this
is how this function is run:
if (cpu_has_veic || cpu_has_vint) {
...
} else {
*** this is true for Ralink / Mediatek
...
if (cpu_has_mips_r2_r6) {
if (cpu_has_ebase_wg) {
...
} else {
*** this is true for Ralink / Mediatek
...
So in summary, ebase is not allocated but assigned to this value:
ebase = CAC_BASE + read_c0_ebase() & 0x3ffff000;
which of course leads to this issues we observed.
So the ebase set by U-Boot shouldn't be relevant for Ralink platform.
Why so?
More likely some code at 0x80000000 is overwritten when installing the
exception handlers because all Ralink SoCs except MT7621 have
0xffffffff80000000 defined as load address. So adding something like
0x1000 should fix your problem too.
Hmmm, not sure that I fully understand this. Could you please explain
again?
AFAIK the CPU probing should detect and set cpu_has_veic accordingly.
Yes, I agree.
Maybe it's a bug by Ralink to not set this bit. I guess that's why a
platform could provide a cpu-feature-overrides.h. Or you could configure
CPU_MIPSR2_IRQ_VI as Horatio stated in his response.
I just checked in decode_config3() and MIPS_CPU_VEIC is not set on
this SoC (config3=00002420 MIPS_CONF3_VEIC=00000040).
@Paul regarding MIPS r6, is there some expectation of the bootloader to
set ebase to a reasonable value or to not change the value at all? Maybe
we need to fix U-Boot?
Yes, some advise on how to fix this would be very welcome. I can easily
add CPU_MIPSR2_IRQ_VI and send a patch for this as well.
Thanks,
Stefan