>>> And indeed there's a deadlock! >>> >>> snd_pcm_period_elapsed() takes the FE pcm stream lock, and will call >>> snd_pcm_trigger. >> >> Indeed, this would deadlock. >> >>> So if we also take the pcm stream lock in the BE >>> trigger, there's a conceptual deadlock: we painted ourselves in a corner >>> by using the same lock twice. >>> >>> Takashi, are you really sure we should protect these for_each_dpcm_be() >>> loops with the same pcm lock? >> >> The call within the FE lock is done only in dpcm_dai_trigger_fe_be(), >> and this should call dpcm_be_dai_trigger() as is. In other places, >> the calls are without FE lock, hence they can take the lock, >> e.g. create a variant dpcm_dai_trigger_fe_be_lock(). > > Or rather it was the code path of snd_soc_dpcm_check_state()? > The function could be called from dpcm_be_dai_trigger(). > Currently dpcm_lock seems to be added at places casually covering some > of the for_each_dpcm_be() or whatever... The whole lock scheme has to > be revisited. > >>> it seems like asking for trouble to >>> revisit the ALSA core just to walking through a list of BEs? Would you >>> object to changing dpcm_lock as I suggested, but not interfering with >>> stream handling? >> >> That would work, too, it's just a pity to degrade the fine-grained >> locks that have been already taken into global locks... > > For the performance problem, at least, you can make it rwlock and > rwsem types (depending on nonatomic) so that the concurrent accesses > would work. The only exclusive lock is the case for adding and > removing the entries, which should be done with write lock / sem down, > while other link traversals can be executed in the read lock / sem. Thanks Takashi for your feedback. Let's first get the locking right. We can optimize performance later. I will continue with the idea of using existing fine-grained locks a bit more, I am with you that this dpcm_lock was not added in a consistent way and reusing the concept is really building on sand. We can remove the lock in snd_soc_dpcm_check_state, I already did the change in other versions. But what I'll need to check further is if indeed dpcm_be_dai_trigger() is called with the FE lock taken already. Adding a lockdep_assert_help() or something would help I guess. The part where I am still not clear is that snd_pcm_period_elapsed uses the irqsave/irqrestore version, but in the initial code you suggested the vanilla irq version is fine. I am not sure I get the nuance here, and btw in the case of SOF the snd_pcm_period_elapsed is called from a workqueue, not an interrupt handler, as a work-around to avoid an IPC deadlock, so we probably don't need the irqsave/irqrestore version anyways.