Re: Reset on Beaglebone Black has become unreliable/broken

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 28.11.24 10:46, Konstantin Kletschke wrote:
> On Thu, Nov 28, 2024 at 10:23:10AM +0100, Ahmad Fatoum wrote:
> 
>> I assume this should be v2022.04? -dirty means you have local patches
>> on top. Do any of them touch SoC-specific, board-specific parts
>> like clock or power?
> 
> Yes, it is "barebox 2022.04.0-dirty #1 Tue Sep 10 08:45:54 UTC 2024".
> The patches we apply do not touch any clock or power, we touch:
> Environment, kernel cmdline, watchdog settings, bootchooser config, 
> autoabortkey. Config stuff.
> 
>> What changed over the last week on the software side? I understand barebox
>> stayed the same? Is the kernel still the same?
> 
> We changed nothing. I use to ship this barebox version with kernel for a
> couple of months. Last week we only ramped up quantity but the fails are
> so high in percentage it should had happened a couple of times before.

Are you still building with the same toolchain?

>> On affected hardware: Does this happen always or only some times?
> 
> Always. Easy reproducable.
> Meanwhile I realized on affected BBBs it can be reproduced this way:
> 
> Boot, hit Ctrl-C to stop barebox at prompt.
> Hit S1 button which is wired to NRESET_INOUT ball A10 (its not S2 as I
> initially wrote, S1).
> System is stuck/frozen/dead.

So repeating these steps on some boards never shows any issues and on
some others it always shows issues?

>> This sounds very similar to the issue fixed in commit 9c1a78f959dd
>> ("Revert "ARM: beaglebone: init MPU speed to 800Mhz""), but that's already
>> included in v2022.04.0, hence the question if you have patches that
>> do anything similar.
> 
> Sounds interesting, I will take a look. As said, we patch no clock
> voltages or something like that.

Ok.

>> Yes, but it sounds strange that only now these problems pop up?
> 
> Yes. Last week we started to experience this problem in production, we
> have ~200 working BBBs, ~20 have this problem. The batch worked
> flawlessly but suddenly a couple of broken BBBs kinda heaped one day,
> now sometimes this happens.
> 
> I am even not so shure if software is to blame or if the hardware is or
> has become glitchy, but falsinh stock u-boot still is able to
> reset/restart on its own on these devices.

My guess would be an incompatibility between the settings in the PMIC
and what barebox configures. barebox doesn't touch the PMIC and tries
to use clock rates that should be safe regardless of what changes Linux
did to the PMIC.

U-Boot, depending on version, may be reprogramming the PMIC to allow
for higher clock rates that barebox doesn't currently go for and this
might be related to the issues you are seeing.

>> Besides checking what changed, you should check if Linux is playing
>> around with the voltages powering the SoC and if it does, disable that
>> to see if it improves the situation.
> 
> Sadly (or gladly?) linux is not involved on affected BBBs. Boot, stop in
> bootloader, hit S1, system freezes.

So this happens even after a completely cold reset?

>> Your barebox restart handler is probably am33xx_restart_soc (named
>> "soc" in reset -l output).
> 
> I will poke around, never in my life was dealing with reset code :-)

I'd suggest you enable CONFIG_DEBUG_LL and look if you see at least a >
character on the serial console output by the MLO.

If you don't see it, try moving these lines:

  am33xx_uart_soft_reset((void *)AM33XX_UART0_BASE);
  am33xx_enable_uart0_pin_mux();
  omap_debug_ll_init();
  putc_ll('>');

to the start of beaglebone_sram_init() and see if you get the > printed.

The point is making sure that barebox itself starts up before seeing where
it's getting stuck.

Cheers,
Ahmad

> 
> Regards
> Konsti
> 
> 


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |




[Index of Archives]     [Linux Embedded]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux