Felix Fietkau <nbd@xxxxxxxx> writes: > On 2022-01-11 11:35, Kalle Valo wrote: >> Lorenzo Bianconi <lorenzo@xxxxxxxxxx> writes: >> >>> Fix a possible race in mt7921_pm_power_save_work() if rx/tx napi >>> schedules ps_work and we are currently accessing device register >>> on a different cpu. >>> >>> Fixes: 1d8efc741df8 ("mt76: mt7921: introduce Runtime PM support") >>> Signed-off-by: Lorenzo Bianconi <lorenzo@xxxxxxxxxx> >>> --- >>> drivers/net/wireless/mediatek/mt76/mt7921/mac.c | 8 ++++++++ >>> 1 file changed, 8 insertions(+) >>> >>> diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/mac.c b/drivers/net/wireless/mediatek/mt76/mt7921/mac.c >>> index defef3496246..0744f6e42ba3 100644 >>> --- a/drivers/net/wireless/mediatek/mt76/mt7921/mac.c >>> +++ b/drivers/net/wireless/mediatek/mt76/mt7921/mac.c >>> @@ -1553,6 +1553,14 @@ void mt7921_pm_power_save_work(struct work_struct *work) >>> dev->fw_assert) >>> goto out; >>> + if (mutex_is_locked(&dev->mt76.mutex)) >>> + /* if mt76 mutex is held we should not put the device >>> + * to sleep since we are currently accessing device >>> + * register map. We need to wait for the next power_save >>> + * trigger. >>> + */ >>> + goto out; >> >> This looks fishy to me. What protects the case when ps_work is run first >> and at the same time another cpu starts accessing the registers? >> >> Do note that I didn't check the code, so I might be missing something. > For atomic context there is a locked counter pm->wake.count which is > used to prevent the device from going to sleep. If the device is > sleeping already on irq/tx, it is woken up and the function is > rescheduled. The device is never put to sleep while the wake count is > non-zero. > > For non-atomic context, the mutex is always held. There is a wrapper > for acquiring and releasing the mutex, which cancels the work after > acquiring the mutex and reschedules the delayed work after updating > the last activity timestamp (which gets checked here after checking > the mutex). > > The corner case that needs this mutex check here is when the work was > scheduled again after processing some NAPI, tx or irq activity and the > work gets run all while another cpu is in the middle of a long running > non-atomic activity that holds the mutex. > > For that we really do need the simple mutex_is_locked check, since > actually holding the lock here would cause a deadlock with the > mutex_acquire wrapper. Very good, thanks Felix and Lorenzo for explaining all this. I just always get wary when I see mutex_is_locked() being used. -- https://patchwork.kernel.org/project/linux-wireless/list/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches