On Tue, Apr 25, 2023 at 11:38:32AM +0800, Hillf Danton wrote: > On 24 Apr 2023 22:18:26 +0300 Fedor Pchelkin <pchelkin@xxxxxxxxx> > > Currently, the synchronization between ath9k_wmi_cmd() and > > ath9k_wmi_ctrl_rx() is exposed to a race condition which, although being > > rather unlikely, can lead to invalid behaviour of ath9k_wmi_cmd(). > > > > Consider the following scenario: > > > > CPU0 CPU1 > > > > ath9k_wmi_cmd(...) > > mutex_lock(&wmi->op_mutex) > > ath9k_wmi_cmd_issue(...) > > wait_for_completion_timeout(...) > > --- > > timeout > > --- > > /* the callback is being processed > > * before last_seq_id became zero > > */ > > ath9k_wmi_ctrl_rx(...) > > spin_lock_irqsave(...) > > /* wmi->last_seq_id check here > > * doesn't detect timeout yet > > */ > > spin_unlock_irqrestore(...) > > /* last_seq_id is zeroed to > > * indicate there was a timeout > > */ > > wmi->last_seq_id = 0 > > Without wmi->wmi_lock held, updating last_seq_id on the waiter side > means it is random on the waker side, so the fix below is incorrect. > Thank you for noticing! Of course that should be done. > > mutex_unlock(&wmi->op_mutex) > > return -ETIMEDOUT > > > > ath9k_wmi_cmd(...) > > mutex_lock(&wmi->op_mutex) > > /* the buffer is replaced with > > * another one > > */ > > wmi->cmd_rsp_buf = rsp_buf > > wmi->cmd_rsp_len = rsp_len > > ath9k_wmi_cmd_issue(...) > > spin_lock_irqsave(...) > > spin_unlock_irqrestore(...) > > wait_for_completion_timeout(...) > > /* the continuation of the > > * callback left after the first > > * ath9k_wmi_cmd call > > */ > > ath9k_wmi_rsp_callback(...) > > /* copying data designated > > * to already timeouted > > * WMI command into an > > * inappropriate wmi_cmd_buf > > */ > > memcpy(...) > > complete(&wmi->cmd_wait) > > /* awakened by the bogus callback > > * => invalid return result > > */ > > mutex_unlock(&wmi->op_mutex) > > return 0 > > > > To fix this, move ath9k_wmi_rsp_callback() under wmi_lock inside > > ath9k_wmi_ctrl_rx() so that the wmi->cmd_wait can be completed only for > > initially designated wmi_cmd call, otherwise the path would be rejected > > with last_seq_id check. > > > > Also move recording the rsp buffer and length into ath9k_wmi_cmd_issue() > > under the same wmi_lock with last_seq_id update to avoid their racy > > changes. > > Better in a seperate one. Well, they are parts of the same problem but now it seems more relevant to divide the patch in two: the first one for incorrect last_seq_id synchronization and the second one for recording rsp buffer under the lock. Thanks!