On 11 April 2014 07:47, Michal Kazior <michal.kazior@xxxxxxxxx> wrote: > On 11 April 2014 07:40, Kalle Valo <kvalo@xxxxxxxxxxxxxxxx> wrote: >> Michal Kazior <michal.kazior@xxxxxxxxx> writes: >> >>> If for some reason copy engine ring buffer became >>> corrupt ath10k could crash the machine due to >>> invalid pointer dereference. It's very unlikely >>> but devices can never be fully trusted so verify >>> if the bmi xfer pointer read back from copy engine >>> matches the original pointer. >> >> The big question is why does this happen? Does this happen only with >> Ben's firmware or is it a more generic problem? > > I'll look more into this. Hmm.. After going through the code a few times I think the bug is actually something else. If the crash happened in complete() it means the completion structure wasn't set up properly. However it is always initialized in ath10k_pci_hif_exchange_bmi_msg() before proceeding. This means xfer pointer read back from ath10k_ce_completed_send_next() or ath10k_ce_completed_recv_next() is wrong. Since the pointer to it is kept on host getting wrong xfer means sw_index must be wrong. If I assume indexes are managed correctly (i.e. no overflows, locking is fine) then it is the entry the sw_index points to that is actually unexpected. This could happen if concurrent transfers occur on pipe 0 or 1 (used for bmi communication) during device bootup. This could be either a concurrent bmi transfer or a non-bmi buffer or an old bmi xfer data. The latter shouldn't be the case because ath10k_pci_hif_exchange_bmi_msg() cleans up after itself. Concurrent access doesn't seem to be the case either. I conclude the bug is not present in the vanilla ath10k code. TL;DR drop the patch, please Michał -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html