* Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> [140903 09:46]: > On 09/02/2014 10:15 PM, Tony Lindgren wrote: > >> - I see to face two kind of "deaths": > >> - the LED still goes on and off and the uart just does not respond > >> even if I tell the button print something on the screen (the button > >> also changes the frequency of the LED so I know that the button is > >> doing something). > >> Also from dumping the content of /proc/interrupts it seems that a > >> wake up is made, the uart should have restored the registers. > > > > OK yeah this is the case I was seeing too. So do you just set the > > LED triggers to none in sysfs to make it easier to reproduce? > > Yes. > > >> - one where the system is dead and the LED does not blink anymore. > >> Also my button is dead. > > > > This I don't think I've seen. This could also be the errata issue on > > your earlier rev beagleboard-xm with off-idle. > > might be. > > Your pstore hint gave me something. I tried that earlier but somehow > assumed that dram content was killed on init. But the content is even > there are pressing the reset button :) Yeah pstore is very nice for debugging mystery hangs :) > However, I was able to capture the case where the LED was not blinking: > The IIR register says 0xc6 (=> line status error). That is okay. At the > same time LSR register says 0xe0. This is not okay. It means that there > is some kind of error and at least one error bit is set in this > register which is not the case. Also those bits are cleared on read > which does not happen here. And we loop forever so the LED does blink > anymore. OK > The RX-count register says that it is empty which sense because bit 0 > is not set (in LSR). However I can read multiple times from the RX FIFO > until I get the "unhandled bus access" error which usually happens > right away if the empty FIFO is read on omap3 HW. In the last test I > mange to read 91 times before the crash. I hoped that this FIFO read > would make the interrupt go away but it did not. > > The HW seems to be in a strange state. It might be either the errata > or something else. I even took the resume routine from omap-serial in > case I did something wrong. In my last test it worked for 10minues > before the interrupt storm came. > > This is probably the same thing I see on the omap-serial driver where I > got from pstore: > > [ 32.659271] random: nonblocking pool is initialized > [ 212.170623] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! > [swapper:0] > > So I *guess* the interrupt routine is looping. This is problem one, no > idea what is going on (the register status captured on 8250-omap makes > no sense). See recent commit cc824534d4fe, and try commenting out the check for HWMOD_FORCE_MSTANDBY in omap_hwmod.c so _reconfigure_io_chain() is always called. If that changes something, we at least have some idea. It could be also the wake-up interrupt looping. So you may also want to try adding some printks (pstore only) into omap_prcm_irq_handler() and omap3xxx_prm_clear_mod_irqs() as that's handling the wake-up event interrupts. > Problem two, where the UART does not wakeup: > What I observed is that sometimes the UART does not wake up properly > i.e. it does not write anything on the console, even where it should. I > can't tell if the read is working properly, the write does not. > From my capture I see that the resume routine was running and the > register should have been written. That means the UART should be up and > running but nothing happens. This seems also be hinting to something needing _reconfigure_io_chain() to be called along the lines of commit cc824534d4fe. > It often works again after the system comes out of resume again (i.e. > RPM suspens and resumes the UART). So it is okay on the next wakeup. Or > the wakeup after next. > From the script: > > | while ((1)) > | do > | > | echo -n 409-chars >/dev/ttyUSB0 > | > | sleep 1 > | a=$(date) > | echo -e "\n#$a" >/dev/ttyUSB0 > | echo $a > | sleep 13; > | done > > I see that sometimes one or two sequential timestamps are missing. And > the it continues like nothing happened. OK. At least it's starting to now sound that the bugs are pretty much the same with 8250 and serial-omap :) Regards, Tony -- To unsubscribe from this list: send the line "unsubscribe linux-serial" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html