On Tue, Jul 30, 2019 at 12:04 PM Pierre-Louis Bossart <pierre-louis.bossart@xxxxxxxxxxxxxxx> wrote: > > On 7/30/19 1:47 PM, Ranjani Sridharan wrote: > > On Tue, 2019-07-30 at 10:45 -0700, Jon Flatley wrote: > >> On Mon, Jul 29, 2019 at 7:23 PM Pierre-Louis Bossart > >> <pierre-louis.bossart@xxxxxxxxxxxxxxx> wrote: > >>> > >>> > >>> > >>> On 7/29/19 7:53 PM, Ranjani Sridharan wrote: > >>>> On Mon, 2019-07-29 at 18:02 -0500, Pierre-Louis Bossart wrote: > >>>>> > >>>>> On 7/29/19 4:53 PM, Jon Flatley wrote: > >>>>>> I've been working on upstreaming the bdw-rt5650 machine > >>>>>> driver for > >>>>>> the > >>>>>> Acer Chromebase 24 (buddy). There seems to be an issue when > >>>>>> first > >>>>>> setting the hardware controls that appears to be crashing the > >>>>>> DSP: > >>>>>> > >>>>>> [ 51.424554] haswell-pcm-audio haswell-pcm-audio: FW > >>>>>> loaded, > >>>>>> mailbox > >>>>>> readback FW info: type 01, - version: 00.00, build 77, source > >>>>>> commit > >>>>>> id: 876ac6906f31a43b6772b23c7c983ce9dcb18a19 > >>>>>> ... > >>>>>> [ 84.924666] haswell-pcm-audio haswell-pcm-audio: error: > >>>>>> audio > >>>>>> DSP > >>>>>> boot timeout IPCD 0x0 IPCX 0x0 > >>>>>> [ 85.260655] haswell-pcm-audio haswell-pcm-audio: ipc: -- > >>>>>> message > >>>>>> timeout-- ipcx 0x83000000 isr 0x00000000 ipcd 0x00000000 imrx > >>>>>> 0x7fff0000 > >>>>>> [ 85.273609] haswell-pcm-audio haswell-pcm-audio: error: > >>>>>> stream > >>>>>> commit failed > >>>>>> [ 85.279746] System PCM: error: failed to commit stream > >>>>>> -110 > >>>>>> [ 85.285388] haswell-pcm-audio haswell-pcm-audio: ASoC: > >>>>>> haswell-pcm-audio hw params failed: -110 > >>>>>> [ 85.293963] System PCM: ASoC: hw_params FE failed -110 > >>>>>> > >>>>>> This happens roughly 50% of the time when first setting > >>>>>> hardware > >>>>>> controls after a reboot. The other 50% of the time the DSP > >>>>>> comes up > >>>>>> just fine and audio works fine thereafter. Adding "#define > >>>>>> DEBUG 1" > >>>>>> to > >>>>>> sound/soc/intel/haswell/sst-haswell-ipc.c makes the issue > >>>>>> occur > >>>>>> much > >>>>>> less frequently in my testing. Seems like a subtle timing > >>>>>> issue. > >>>>>> > >>>>>> There were timing issues encountered during the bringup of > >>>>>> the 2015 > >>>>>> chromebook pixel (samus) which uses the bdw-rt5677 machine > >>>>>> driver. > >>>>>> Those were slightly different, and manifested during repeated > >>>>>> arecords. Both devices use the same revision of the sst2 > >>>>>> firmware. > >>>>>> > >>>>>> Any ideas for how to debug this? > >>>>> > >>>>> this could be trying to send an IPC while you are already > >>>>> waiting > >>>>> for > >>>>> one to complete. we've seen this before with SOF, if the IPCs > >>>>> are > >>>>> not > >>>>> strictly serialized then things go in the weeds and timeout. > >>>> > >>>> Pierre/Jon > >>>> > >>>> In this case it looks like the DSP boot failed leading to the IPC > >>>> timeout? WOndering if increasing the boot timeout would help? > >> > >> I did actually try this without success. > >> > >>> > >>> Yes, that too. The boot timeout is typically experimentally > >>> defined, and > >>> never decreasing due to platform variations... > >>> I am still leaning more on the side of an side effect between two > >>> IPCs, > >>> the added DEBUG points to the printk which solves timing issues. > >>> The > >>> boot timeout would typically not be impacted by such changes. > >> > >> I think the real struggle I'm having is finding a good debugging > >> method that doesn't impact the timing of the IPCs significantly (as > >> adding DEBUG seems to). This could maybe be overcome with using a > >> stress test to reproduce. The crash only seems to occur when first > >> booting the DSP, and so far I've been testing this by completely > >> power > >> cycling the machine on every test, which is very slow and tedious. So > >> maybe the issue with DEBUG defined occurs 1 in 20 reboots rather than > >> 1 in 2, I wouldn't know. If there's a way to reboot the DSP and > >> reproduce this crash without rebooting the entire device that would > >> be > >> very helpful to me. > > Maybe you've already tried this. But, how about blacklisting the audio > > driver and then trying a modprobe/rmmod to insert and remove themodule. This should attempt to boot the DSP upon every modprobe. > > But what I am not sure about is whether the rmmod would succeed if the > > IPC times out because the DSP has crashed. > > I don't think we can really reduce the 'Heisenbug' nature of code > instrumentations. > But as Ranjani suggested it increasing the test frequency would make > things more observable. I would go for suspend-resume tests, that would > also force a DSP reboot without requiring a full reboot. > > rtcwake -s 3 -m mem > > I suspect modprobe/rmmod isn't likely to work, those legacy drivers were > not exactly written with stress-test in mind. Suspend-resume is likely > more reliable - been used in real products but tested with older kernels > so your mileage may vary. > > We should really have completed SOF support for Broadwell instead of > supporting zombie drivers. Gah. I've been off this issue for a couple of weeks but yesterday I made some progress. There seems to be an issue when suspending the ALC5650. I think the nondeterministic behavior I was seeing just had to do with whether or not the DSP had yet suspended. I reverted commit 0d2135ecadb0 ("ASoC: Intel: Work around to fix HW D3 potential crash issue") and things started working, including suspend/resume of the DSP. Any ideas for why this may be? I would like to resolve this so I can finish upstreaming the bdw-rt5650 machine driver. Thanks, -Jon _______________________________________________ Alsa-devel mailing list Alsa-devel@xxxxxxxxxxxxxxxx https://mailman.alsa-project.org/mailman/listinfo/alsa-devel