On 06/26/2014 03:27 AM, Laura Abbott wrote: > On 6/25/2014 5:13 AM, Tushar Behera wrote: >> On 06/25/2014 03:59 AM, Laura Abbott wrote: >>> On 6/24/2014 10:47 AM, Laura Abbott wrote: >>>> On 6/23/2014 11:32 AM, Kevin Hilman wrote: >>>>> On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera <trblinux@xxxxxxxxx> wrote: >>>>>> Adding linux-samsung-soc and linux-arm-kernel ML for wider audience. >>>>>> >>>>>> On 06/19/2014 04:12 PM, Tushar Behera wrote: >>>>>>> On 06/19/2014 03:02 PM, Tushar Behera wrote: >>>>>>>> On 06/18/2014 09:22 AM, Kevin Hilman wrote: >>>>>>>>> On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera <trblinux@xxxxxxxxx> wrote: >>>>>>>>>> On 06/17/2014 10:23 PM, Kevin Hilman wrote: >>>>>>>>>>> Sachin, >>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot <khilman@xxxxxxxxxx> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Tree/Branch: mainline >>>>>>>>>>>> Git describe: v3.16-rc1-2-gebe0618 >>>>>>>>>>>> Failed boot tests (console logs at the end) >>>>>>>>>>>> =========================================== >>>>>>>>>>>> exynos5420-arndale-octa: FAIL: arm-exynos_defconfig >>>>>>>>>>>> ste-snowball: FAIL: arm-u8500_defconfig >>>>>>>>>>> >>>>>>>>>>> FYI... these failures are getting more consistent on my octa board, >>>>>>>>>>> but still not failing every time. >>>>>>>>>>> >>>>>>>>>>> Kevin >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Kevin, >>>>>>>>>> >>>>>>>>>> Same here. >>>>>>>>>> >>>>>>>>>> Observation: If you soft-reset the board (through the jumpers) after >>>>>>>>>> getting this problem, the problem keeps repeating. But if you hard-reset >>>>>>>>>> the board (by removing the power cord), the problem doesn't occur during >>>>>>>>>> next iteration. >>>>>>>>> >>>>>>>>> I don't ever use the soft-reset, I only toggle the wall power. I >>>>>>>>> don't ever actually remove the power cord though, I'm using a >>>>>>>>> USB-controlled relay to toggle the wall power. >>>>>>>>> >>>>>>>>> Kevin >>>>>>>>> >>>>>>>> >>>>>>>> Laura, >>>>>>>> >>>>>>>> We are getting following kernel panic [1] (not always, but quite >>>>>>>> regularly) while booting Arndale-Octa (based on Samsung's Exynos5420) >>>>>>>> board with upstream kernel. I haven't observed this issue with other >>>>>>>> boards yet. >>>>>>>> >>>>>>>> This issue is observed when I am booting with uImage + dtb (within >>>>>>>> roughly ~10 iterations). >>>>>>>> >>>>>>> >>>>>>> Some more information: >>>>>>> >>>>>>> The boot logs are provided in pastebin, okay[2] and failed[3]. >>>>>>> >>>>>>> In case of boot failures, I am getting a higher value for vm_total_pages >>>>>>> (684424 in [3]). In case of successful boot on my board, it is always >>>>>>> 521232 [2] on my board. >>>>> >>>>> I can confirm that reverting the "Get rid of meminfo" patch gets the >>>>> Octa board booting reliably again for me also. >>>>> >>>>> In case it helps, some boot logs for failures from the last copule >>>>> linux-next build/boot cycles can be seen here: >>>>> http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log >>>>> http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log >>>>> >>>> >>>> Sorry, I missed this yesterday. I'm going to take a look. >>>> >>> >>> Were all of >>> >>> http://pastebin.com/1iLaizuL >>> http://pastebin.com/5tdDt4GL >>> http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log >>> http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-exynos5420-arndale-octa.log >>> >>> collected on the same type of board with the same amount of DRAM? I'm seeing a >>> different amount of total pages across all those logs. All the logs have the >>> same lowmem limit so it seems like the upper bound was being calculated >>> incorrectly for passing to free_area_init_node. Nothing is immediately jumping >>> out at me so can you boot up with a small debug patch? >>> >>> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c >>> index 659c75d..88eac1f 100644 >>> --- a/arch/arm/mm/init.c >>> +++ b/arch/arm/mm/init.c >>> @@ -187,6 +187,8 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low, >>> unsigned long zone_size[MAX_NR_ZONES], zhole_size[MAX_NR_ZONES]; >>> struct memblock_region *reg; >>> >>> + pr_err("XXXXXXX min %lx max_low %lx max_high %lx\n", min, max_low, max_high); >>> + __memblock_dump_all(); >>> /* >>> * initialise the zones. >>> */ >>> >>> It would be helpful to do this across a few bootups to see if the values are >>> actually consistent. I'll keep looking in the meantime. >>> >>> Thanks, >>> Laura >>> >> >> Thanks Laura for the pointer. In case of error, I am getting some random >> memblock_add() calls from drivers/of/fdt.c:early_init_dt_scan_memory. >> >> The issue seems to be from u-boot, where it is not updating the memory >> subnode properly. I have got a fix for the u-boot, which I am testing >> right now. I will update tomorrow after I do some more test. >> > > I'm concerned my change can stay as is if this is exposing an issue > in u-boot. Asking people to change bootloaders rarely ends well. Can > you elaborate on what u-boot is doing that would be exposing this > issue? > > Thanks, > Laura > > Laura, Here is my assessment of the current situation. *Bug in the u-boot* Current u-boot for Arndale-octa board has defined NR_BANKS as 12 and the core uses a global structure (gd->bd) to maintain the start and size of individual banks. Depending on the revision of SoC used on the board, the board file [1] updates the start/size for either 8 or 12 banks. In case of current revision of Arndale-Octa boards, the board file always updates start/size for 8 banks, leaving the start/size data for remaining 4 banks uninitialized. But the u-boot core[2] updates the value of all the 12 banks, thus potentially updating invalid data for last 4 banks. The issue can be fixed by resetting the start/size for unused memory banks to 0/0.[3] *Before migration to memblock* The path for adding DRAM banks was done through [4]. For Exynos systems, NR_BANKS was defined as 8. The initial check for rejecting any banks beyond NR_BANKS was good enough for fixing this issue. The bootlog[5] (with some debug messages) shows the invalid data, both in u-boot and kernel. Please grep for "NR_BANKS too low, ignoring memory" in the bootlog. *After migration to memblock* Now that the memory banks are added through [6], all the memory banks are getting updated unconditionally resulting in the panic. IMO, the bug is in u-boot and we should fix that. [1] https://github.com/tusharbehera/u-boot/blob/tracking-arndale-octa-v2012.07/board/samsung/smdk5420/smdk5420.c#L158 [2] https://github.com/tusharbehera/u-boot/blob/tracking-arndale-octa-v2012.07/arch/arm/lib/bootm.c#L80 [3] https://github.com/tusharbehera/u-boot/commit/9be794e886603a80f2c8686a75187ae67ac2158d [4] https://github.com/tusharbehera/linux/blob/v3.15-rc1/arch/arm/kernel/setup.c#L629 [5] http://pastebin.com/vLP2oG1mP [6] https://github.com/tusharbehera/linux/blob/v3.16-rc1/drivers/of/fdt.c#L878 -- Tushar Behera -- To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html