Re: [PATCH] ALSA: hda - delay resume haswell hdmi codec in system resume

Takashi Iwai <tiwai@xxxxxxx> · Fri, 05 Apr 2013 19:04:40 +0200

At Fri, 5 Apr 2013 16:42:43 +0000,
Lin, Mengdong wrote:
> 
> Hi Takashi and David,
> 
> > -----Original Message-----
> > From: Takashi Iwai [mailto:tiwai@xxxxxxx]
> > Sent: Friday, April 05, 2013 9:02 PM
> > To: David Henningsson
> > Cc: Lin, Mengdong; alsa-devel@xxxxxxxxxxxxxxxx
> > Subject: Re:  [PATCH] ALSA: hda - delay resume haswell hdmi codec
> > in system resume
> > 
> > At Tue, 02 Apr 2013 14:49:40 +0200,
> > David Henningsson wrote:
> > >
> > > On 04/02/2013 02:18 PM, Takashi Iwai wrote:
> > > > At Tue, 02 Apr 2013 13:53:16 +0200,
> > > > David Henningsson wrote:
> > > >>
> > > >> On 03/27/2013 11:01 AM, Lin, Mengdong wrote:
> > > >>>> -----Original Message-----
> > > >>>> From: David Henningsson [mailto:david.henningsson@xxxxxxxxxxxxx]
> > > >>>> Sent: Wednesday, March 27, 2013 4:19 PM
> > > >>>
> > > >>>>> The Haswell codec set_power_state ops
> > > >>>>> (intel_haswell_set_power_state) will
> > > >>>> only wait if this is a delayed resume and clear this flag after
> > > >>>> waiting. And actually, there is no waiting even in this case.
> > > >>>> Because when 1st user operation after system resume happens, Gfx
> > > >>>> already finishes resuming and audio initialization, so as long as
> > > >>>> intel_haswell_wait_ready_to_resume() enable the unsol event, the
> > > >>>> unsol event comes and so so waiting finishes. The 300ms time out
> > > >>>> is set for safety consideration in case unsol event is lost. I've not
> > observed any unsol event lost till now.
> > > >>>>
> > > >>>> As for the timeout, I suggest you use the codec->bus->workq
> > > >>>> instead of creating a new workq. I think that will also give us
> > > >>>> some serialisation, i e, protection against race conditions if
> > > >>>> the timeout happen at the same time as the unsol event.
> > > >>>
> > > >>> Hi David,
> > > >>>
> > > >>> The new added "resume_wq" for hdmi codec is a wait queue, not a work
> > queue like codec->bus->workq.
> > > >>> It's expected to wake up as soon as the unsol event is got.
> > > >>
> > > >> Sure; but I don't see why you need a wait queue for that? Why don't
> > > >> you just call the resume path from the unsol event handler
> > > >> (hdmi_present_sense, or its caller), and then also cancel the
> > > >> timeout handler (which can then be in the normal workq)?
> > > >
> > > > Because the delayed resume actually fakes as if the resume is done.
> > > > This is necessary not to block other device's resume operation.
> > > >
> > > > Since it looks as if ready, user-space might restart things soon
> > > > again before the delayed resume is really finished.  So, some
> > > > serialization is required there.
> > >
> > > Lin, would it be possible to add some chain of events description to
> > > the bug commit? The different contexts are just boggling my mind :-)
> > 
> > Yeah, a nice code flow picture would be helpful :)
> > 
> > > Assume we have two cases, system idle after S3 and immediate playback
> > > after S3.
> > >
> > > Is this correct:
> > >
> > > System idle:
> > >   1. System skips hda_call_codec_resume and sets codec->resume_delayed.
> > >   2. Since the codec is not reinitialized, nothing powers up the
> > > codec, so this is all that happens.
> > >   3. Since unsol events are not enabled (?), we have a bug that jack
> > > detection does not work and cannot be detected from userspace...?
> 
> Yes. This is a side effect of this patch. The unsol event is delayed until the user operation triggers the delayed resume.
> Then intel_haswell_wait_ready_to_resume() will enable unsol events.
> 
> > >
> > > Immediate playback:
> > >   1. System skips hda_call_codec_resume and sets codec->resume_delayed.
> > >   2. Process context wants to start playback, which powers up the
> > > codec and calls intel_haswell_set_power_state.
> > >   3. intel_haswell_wait_ready_to_resume enables unsol events and
> > > starts to wait on ready_to_resume
> > >   4. Gfx init finishes, and workq context fires the unsol event, which
> > > calls hdmi_intrinsic_event, which triggers the resume_wq.
> > >   5. Process context and workq context now continues in parallel,
> > > potentially (but hopefully not) leading to race conditions?
> > 
> > The current patch has a small race, but it can be avoided by clearing
> > spec->ready_to_resume early enough, e.g. in suspend callback or in
> > init callback, like the patch below.
> > 
> 
> I think there is no race for Haswell. In my test I observed that the unsol evnet will not be received
> until intel_haswell_wait_ready_to_resume() enables the unsol event on the pins.
> So I put "spec->ready_to_resume = 0" before enabling the unsol event. 

Practically your patch would work well, but theoretically the unsol
event might be issued and handled by hdmi_intrinsic_event() before you
go into intel_haswell_wait_ready_to_resume().  That is, you _clear_
spec->ready_to_resume again and wait for the unsol event that had been
already processed.  That's the race David pointed.

The fix is simply not to clear spec->ready_to_resume in
intel_haswell_wait_ready_to_resume() but clear it much earlier already
before the resume path as my patch does.  Then wait_event() passes
immediately (and the unsol enablement of is anyway harmless).

Takashi
_______________________________________________
Alsa-devel mailing list
Alsa-devel@xxxxxxxxxxxxxxxx
http://mailman.alsa-project.org/mailman/listinfo/alsa-devel