Re: [BUG] bdw-rt5650 DSP boot timeout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/30/19 1:47 PM, Ranjani Sridharan wrote:
On Tue, 2019-07-30 at 10:45 -0700, Jon Flatley wrote:
On Mon, Jul 29, 2019 at 7:23 PM Pierre-Louis Bossart
<pierre-louis.bossart@xxxxxxxxxxxxxxx> wrote:



On 7/29/19 7:53 PM, Ranjani Sridharan wrote:
On Mon, 2019-07-29 at 18:02 -0500, Pierre-Louis Bossart wrote:

On 7/29/19 4:53 PM, Jon Flatley wrote:
I've been working on upstreaming the bdw-rt5650 machine
driver for
the
Acer Chromebase 24 (buddy). There seems to be an issue when
first
setting the hardware controls that appears to be crashing the
DSP:

[   51.424554] haswell-pcm-audio haswell-pcm-audio: FW
loaded,
mailbox
readback FW info: type 01, - version: 00.00, build 77, source
commit
id: 876ac6906f31a43b6772b23c7c983ce9dcb18a19
...
[   84.924666] haswell-pcm-audio haswell-pcm-audio: error:
audio
DSP
boot timeout IPCD 0x0 IPCX 0x0
[   85.260655] haswell-pcm-audio haswell-pcm-audio: ipc: --
message
timeout-- ipcx 0x83000000 isr 0x00000000 ipcd 0x00000000 imrx
0x7fff0000
[   85.273609] haswell-pcm-audio haswell-pcm-audio: error:
stream
commit failed
[   85.279746]  System PCM: error: failed to commit stream
-110
[   85.285388] haswell-pcm-audio haswell-pcm-audio: ASoC:
haswell-pcm-audio hw params failed: -110
[   85.293963]  System PCM: ASoC: hw_params FE failed -110

This happens roughly 50% of the time when first setting
hardware
controls after a reboot. The other 50% of the time the DSP
comes up
just fine and audio works fine thereafter. Adding "#define
DEBUG 1"
to
sound/soc/intel/haswell/sst-haswell-ipc.c makes the issue
occur
much
less frequently in my testing. Seems like a subtle timing
issue.

There were timing issues encountered during the bringup of
the 2015
chromebook pixel (samus) which uses the bdw-rt5677 machine
driver.
Those were slightly different, and manifested during repeated
arecords. Both devices use the same revision of the sst2
firmware.

Any ideas for how to debug this?

this could be trying to send an IPC while you are already
waiting
for
one to complete. we've seen this before with SOF, if the IPCs
are
not
strictly serialized then things go in the weeds and timeout.

Pierre/Jon

In this case it looks like the DSP boot failed leading to the IPC
timeout? WOndering if increasing the boot timeout would help?

I did actually try this without success.


Yes, that too. The boot timeout is typically experimentally
defined, and
never decreasing due to platform variations...
I am still leaning more on the side of an side effect between two
IPCs,
the added DEBUG points to the printk which solves timing issues.
The
boot timeout would typically not be impacted by such changes.

I think the real struggle I'm having is finding a good debugging
method that doesn't impact the timing of the IPCs significantly (as
adding DEBUG seems to). This could maybe be overcome with using a
stress test to reproduce. The crash only seems to occur when first
booting the DSP, and so far I've been testing this by completely
power
cycling the machine on every test, which is very slow and tedious. So
maybe the issue with DEBUG defined occurs 1 in 20 reboots rather than
1 in 2, I wouldn't know. If there's a way to reboot the DSP and
reproduce this crash without rebooting the entire device that would
be
very helpful to me.
Maybe you've already tried this. But, how about blacklisting the audio
driver and then trying a modprobe/rmmod to insert and remove themodule. This should attempt to boot the DSP upon every modprobe.
But what I am not sure about is whether the rmmod would succeed if the
IPC times out because the DSP has crashed.

I don't think we can really reduce the 'Heisenbug' nature of code instrumentations. But as Ranjani suggested it increasing the test frequency would make things more observable. I would go for suspend-resume tests, that would also force a DSP reboot without requiring a full reboot.

rtcwake -s 3 -m mem

I suspect modprobe/rmmod isn't likely to work, those legacy drivers were not exactly written with stress-test in mind. Suspend-resume is likely more reliable - been used in real products but tested with older kernels so your mileage may vary.

We should really have completed SOF support for Broadwell instead of supporting zombie drivers. Gah.
_______________________________________________
Alsa-devel mailing list
Alsa-devel@xxxxxxxxxxxxxxxx
https://mailman.alsa-project.org/mailman/listinfo/alsa-devel



[Index of Archives]     [ALSA User]     [Linux Audio Users]     [Pulse Audio]     [Kernel Archive]     [Asterisk PBX]     [Photo Sharing]     [Linux Sound]     [Video 4 Linux]     [Gimp]     [Yosemite News]

  Powered by Linux