On 06/17/2015 05:12 PM, Dinh Nguyen wrote: > On 06/17/2015 04:30 PM, Russell King - ARM Linux wrote: >> On Wed, Jun 17, 2015 at 03:35:13PM -0500, Dinh Nguyen wrote: >>> On Mon, Jun 1, 2015 at 6:50 AM, Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote: >>>> Hi Russell, >>>> >>>> On Mon, Jun 1, 2015 at 12:53 PM, Russell King - ARM Linux >>>> <linux@xxxxxxxxxxxxxxxx> wrote: >>>>> On Mon, Jun 01, 2015 at 12:41:01PM +0200, Geert Uytterhoeven wrote: >>>>>> FWIW, I have the feeling this has a slight influence on boot reliability on >>>>>> two of my boards: >>>>>> - r8a7740/armadillo, which is known to suffer from a cache-related bug in >>>>>> its bootloader, seems to have a higher change of booting successfully on >>>>>> cold boot, >>>>>> - sh73a0/kzm9g, which has known cache-issues with secondary CPU boot up, >>>>>> seems to have a lower chance of booting successfully. >>>>>> >>>>>> No time to spend all week turning this into a statistical significant test >>>>>> project... The reset button is my friend... >>>>> >>>>> Damn it, you sent this right after I merged and pushed out this change in >>>>> my for-arm-soc branch, and was just about to send it to the arm-soc people. >>>>> What excellent timing you have. :) >>>> >>>> Don't worry, I didn't send that email to make you postpone this change. >>>> Giving the fuzziness of reproduction, and the flakiness (esp. on Armadillo) >>>> of the boot loader, and these are old SoCs, please go ahead. >>>> >>>>> What happens on the kzm9g if you revert the mach-shmobile changes? >>>> >>>> Seems to make no difference. >>>> >>>>> For armadillo, do you use the decompressor? That should be doing all the >>>>> cache cleaning already, prior to the kernel being entered. >>>> >>>> I think so. >>>> >>>> Corruption pattern ranges from lock up, over "Error: unrecognized/unsupported >>>> machine ID", to booting almost completely, but lacking a few devices due to >>>> a corrupted DTB. Been like that as long as I remember, i.e. since I got the >>>> board ca. 1 year ago. Boots fine (100%) with kexec. >>>> >>> >>> It seems like this patch is causing the SoCFPGA to not boot with SMP >>> reliably. About 1 out of every 10 reboots, I'm seeing the boot failure >>> below. The error seems to only happen when I do a cold or warm reboot, >>> but never occurs during a power-up. If I revert this patch, or put >>> back the call to v7_invalidate_l1 in socfpga_secondary_startup , then >>> its able to boot 100% of the time. >> >> It really sucks that you're only just testing this change now, because >> I've frozen my tree, and removing it for the next merge window is going >> to be an entirely non-trivial matter. You were copied on the original >> patch, which you failed to test... I can't say I have _much_ sympathy >> for a bug report at this point in time. >> > > I apologize for not catching this error while testing this patch. But I > did test it when you first sent it out..I probably didn't do a stress > test. Sometimes the reboot fails in the 1st attempt, sometimes it fails > in the 9th attempt. > > I only caught this error when I was testing my recent changes to use > CPU_METHOD_OF_DECLARE. > > For me, I don't think you need to revert this patch or anything, but a > fix can go in for a -rcX? > Also, I am not seeing the error on the SoCFPGA Arria 10 platform at all. This Arria10 platform is running a different version of bootloader than the Cyclone5. Although, I also did test with the latest version of U-Boot on the Cyclone5. Dinh -- To unsubscribe from this list: send the line "unsubscribe linux-tegra" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html