Re: [PATCH] m68k: allow ColdFire m5441x parts to run with MMU enabled

Greg Ungerer <gregungerer@xxxxxxxxxxxxxx> · Tue, 22 Aug 2017 11:08:43 +1000

Hi Angelo,

On 22/08/17 10:35, Angelo Dureghello wrote:
On 21/08/2017 09:15, Greg Ungerer wrote:
On 20/08/17 23:26, Angelo Dureghello wrote:
On 20/08/2017 14:44, Greg Ungerer wrote:
On 18/08/17 01:02, Angelo Dureghello wrote:
On 14/08/2017 06:16, Greg Ungerer wrote:
On 12/08/17 21:17, Angelo Dureghello wrote:
On 10/08/2017 09:06, Greg Ungerer wrote:
On 10/08/17 01:32, Angelo Dureghello wrote:
[snip]
sure, on this board  http://sysam.it/cff_stmark2.html
there are 128MB of ddr2.

External SDRAM is accessible, at least without any mmc support enabled,
from 0x40000000.

I have following test config:

     GNU nano 2.8.6 File: arch/m68k/configs/stmark2_defconfig

CONFIG_LOCALVERSION="stmark2-001"
[snip]

I tried still yesterday a bit, but seems there is no much support for
earlyprintk / low level debug for this architecture.

In case i can try with a gpio toggling routine, at least to find
where kernel stops.

The attached patch, is a quick and dirty early console output method.
It works for me on the m5475, should work for you "as is" on the 5441x too.

It is kind of an early printk. Of course it still needs the early
kernel boot to have succeeded before you will get anything much coming out.
But it is worth trying.

Ok many thanks. Btw i used a __square(); function written in asm, so i am
sure i see the gpio toggling in very early stages.

I am wondering if the non-0 base RAM may be a problem. I have only run
the MMU enabled code on platforms with 0 based RAM so far. But lets see if
the early console trace attached gives us anything before digging into that.

This MCU has sdram area physically mapped at 0x4000 0000 so U-Boot, to be
able to execute the kernel must load it to that location/area anyway.

But i have seen that it is not a problem, after MMU is enabled in head.S
the jump
                  movel   #_vstart,%a0      /* jump to "virtual" space */
          jmp     %a0@

works fine. Since that range is not hitting anything that is maintained
physical, it can be translated into virtual without any issue.

Yeah, it is not so much the initial start up that I think will
be the problem. More the setup of the MMU mapping tables later
in boot.

After some hard debug, i see the execution stops at:

asmlinkage __visible void __init start_kernel(void)
     ...
     setup_arch(&command_line);      setup_mm.c
        ...
        paging_init();               mm/mcfmmu.c
           ...
           empty_zero_page = (void *) alloc_bootmem_pages(PAGE_SIZE);
           ^line 47 mcfmmu.c

Inside alloc_bootmem_pages(), execution seems to end up finally to
mm/bootmem.c and likely to alloc_bootmem_bdata().
In case i can still proceed to find the exact place where execution stops,
but i suspect in the while(1), line 545.

As a curious thing, i find in a different cf CPU code "m54xx.c"
the following:

void __init config_BSP(char *commandp, int size)
{
#ifdef CONFIG_MMU
      cf_bootmem_alloc();
      mmu_context_init();
#endif
Do also m5441x.c maybe need this calls ?

Yes, you will need this. So that code above is only getting run when
configured for a 547x CPU family. Attached is a rework of that code
so that it will be run for all ColdFire MMU varients. Can you try
that out?

Would be very nice to have MMU working. Strangely, i don't see any
board_config with it enabled. Was it ever tested on some Coldfire ?

Oh, yeah, I run this on a real M5475 EVB board for every kernel
mainline release, with and without MMU enabled. See the
arch/m68k/configs/m5475evb_defconfig, it will default to having
the MMU enabled.

I have todays linux-4.13-rc5 running on it here now:

# cat /proc/version
Linux version 4.13.0-rc5-00001-gb014090-dirty (gerg@goober) (gcc version 5.4.0 (GCC)) #1 Mon Aug 14 10:14:12 AEST 2017

# cat /proc/cpuinfo
CPU:            ColdFire
MMU:            ColdFire
FPU:            ColdFire
Clocking:       264.1MHz
BogoMips:       264.19
Calibration:    1320960 loops
#

Regards
Greg

Ok, i applied your patch, and still the kernel is hanging silently,
so i started up a new debug session again.

What is actually happening (after your patch has been applied) is:

setup_arch()                arch/m68k/kernel/setup_mm.c
   paging_init()
memmap_init()               mm/page_alloc.c
memmap_init_zone()
   __init_single_page()
       set_page_links()       include/linux/mm.h
          set_page_zone()
            kernel hangs silently on this line
            page->flags &= ~(ZONES_MASK << ZONES_PGSHIFT);

Can you run your current code with the console debug code I sent
a little while back?

I ask because I suspect it should give something based on your debug
above. I played around a little trying to fake out my configuration
to make it look like the RAM was non-zero based. I couldn't get a fail,
but I would like to add some more debug to see what is going on with
the page pointers from your debug.

Can you apply the attached patch and get any extra debug?

I am wondering how mmu works, so at the moment mmu is enabled,
in head.S, i would expect that code compiled for 0x40001000 would
not run, since jumps would be translated to some different physical
addresses, but execution sill works.
At the same, after enabling mmu i would expect .data vars to be
invalid, since their address would be translated to a different
location, while not, the init values of .data variables are still
valid. In case, i am interested to understand this points.

On the ColdFire the kernel relies on all RAM and IO peripheral
addresses) to "hit" the ACR registers - and essentially be passed
through as an identity physical = virtual mapping. If you look at
the operation of the memory address translation when virtual mode
is enabled (in the ColdFire MMU sections of the 5475 and 54411
reference manual) you will see that addresses are checked in order
to be for the MMUBAR, RAMBAR, ACR, then MMU.

For example a kernel address when in supervisor mode will hit
ACR1 or ACR3 the way we set them up in arch/m68k/coldfire/head.S.
And that is why you see kernel code and data still being valid after
the MMU is enabled in virtual mode. No TLB entries required for this.

Looking at your call sequence above I can see that the physical
RAM start address being non-zero is going to come into play. I'll
dig into this a little more tomorrow see if I can figure out what
is going on.

Thanks for the kind clarifications.

I'll look in this things too in next days, learning is always nice.
Btw, about load/entry address, i have noticed a possible basic
difference betweeen mcf5441x and mcf547x series:

The second one (your cpu) is v4e and probably more recent i guess, and
one major difference from datasheet seems to be that it is Harvard.
So probably, for this reason, you can address ram from 0 there.

IIRC the 5475 was the first ColdFire with MMU, it is pretty old. Pretty
sure the 54411 came later. Not sure what the thinking was on the different
default memory layout though.

Finally, cleaning out my debug lines, i found i removed an important line.
So i am back to original "second" error we was trying to understand.

So current more clear status is:

U-Boot 2017.09-rc2-00151-g2d7cb5b426-dirty (Aug 22 2017 - 00:22:46 +0200)

CPU:   Freescale MCF54410 (Mask:9f Version:2)
       CPU CLK 240 MHz BUS CLK 120 MHz FLB CLK 60 MHz
       INP CLK 30 MHz VCO CLK 480 MHz
SPI:   ready
DRAM:  128 MiB
SF: Detected is25lp128 with page size 256 Bytes, erase size 64 KiB, total 16 MiB
In:    serial
Out:   serial
Err:   serial
Hit any key to stop autoboot:  0
SF: Detected is25lp128 with page size 256 Bytes, erase size 64 KiB, total 16 MiB
device 0 offset 0x100000, size 0x1d9728
SF: 1939240 bytes @ 0x100000 Read: OK
## Booting kernel from Legacy Image at 40001000 ...
   Image Name:   mainline kernel
   Created:      2017-08-22   0:07:25 UTC
   Image Type:   M68K Linux Kernel Image (uncompressed)
   Data Size:    1939176 Bytes = 1.8 MiB
   Load Address: 40001000
   Entry Point:  40001000
   Verifying Checksum ... OK
   Loading Kernel Image ... OK
Linux version 4.12.0stmark2-001-11691-g571d81b2b55f-dirty (angelo@jerusalem) (gcc version 4.9.0 (crosstools-sysam-2016.04.16)) #182 Tue Aug 22 02:07:24 CEST 2017
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:6219 free_area_init_node+0x2f4/0x2fa
CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0stmark2-001-11691-g571d81b2b55f-dirty #182
Stack from 4017deec:

        4017deec
 4017b3dd
 40007972
 00000000
 00000000
 47d9f62c
 00020000
 00000000

        00000000
 4017df9c
 40007a14
 4016dd8e
 0000184b
 4019caca
 00000009
 00000000

        00000000
 4019caca
 4016dd8e
 0000184b
 48000000
 40204000
 47d9f62c
 40001000

        00000000
 47d9ef1c
 40001480
 4013010c
 4012cd16
 4017dfa8
 4019ecc0
 00012000

        00002000
 4019ccb4
 00000000
 4017df9c
 00020000
 00000000
 4019a3f2
 4017df9c

        00000001
 401da8c0
 401da774
 4019ebc8
 00004000
 00000000
 00000000
 4017dfc8

Call Trace:
 [<40007972>] __warn+0xa4/0xc0
 [<40007a14>] warn_slowpath_null+0x1a/0x22
 [<4019caca>] free_area_init_node+0x2f4/0x2fa
 [<4019caca>] free_area_init_node+0x2f4/0x2fa
 [<40001000>] kernel_pg_dir+0x0/0x1000
 [<40001480>] kernel_pg_dir+0x480/0x1000
 [<4013010c>] memset+0x0/0x80
 [<4012cd16>] strlen+0x0/0x14
 [<4019ecc0>] __alloc_bootmem+0x16/0x3c
 [<4019ccb4>] free_area_init+0x20/0x26
 [<4019a3f2>] paging_init+0xee/0xfa
 [<4019ebc8>] free_bootmem_node+0x0/0x34
 [<40199fbc>] setup_arch+0xcc/0x16e
 [<40024eb2>] printk+0x0/0x18
 [<4019ecaa>] __alloc_bootmem+0x0/0x3c
 [<40198550>] start_kernel+0x68/0x3ae
 [<40001000>] kernel_pg_dir+0x0/0x1000
 [<400020f2>] _exit+0x0/0x6

---[ end trace 0000000000000000 ]---
On node 0 totalpages: 16384
free_area_init_node: node 0, pgdat 401da8c0, node_mem_map a8c0401d
                                                            ^^^^^^^^
  DMA zone: 72 pages used for memmap
  DMA zone: 0 pages reserved
  DMA zone: 16384 pages, LIFO batch:3
/page_alloc.c(1171): page=a8c0401d pfn=131072
                            ^^^^^^^^
Ok, this is getting somewhere. Clearly that page pointer is not valid.

I'll have a dig around and see if I can figure out what might cause that.

Regards
Greg
--
To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html