Fedor Pchelkin <pchelkin@xxxxxxxxx> writes: > On Wed, Apr 26, 2023 at 07:07:08AM +0800, Hillf Danton wrote: >> Given similar wait timeout[1], just taking lock on the waiter side is not >> enough wrt fixing the race, because in case job done on the waker side, >> waiter needs to wait again after timeout. >> > > As I understand you correctly, you mean the case when a timeout occurs > during ath9k_wmi_ctrl_rx() callback execution. I suppose if a timeout has > occurred on a waiter's side, it should return immediately and doesn't have > to care in which state the callback has been at that moment. > > AFAICS, this is controlled properly with taking a wmi_lock on waiter and > waker sides, and there is no data corruption. > > If a callback has not managed to do its work entirely (performing a > completion and subsequently waking waiting thread is included here), then, > well, it is considered a timeout, in my opinion. > > Your suggestion makes a wmi_cmd call to give a little more chance for the > belated callback to complete (although timeout has actually expired). That > is probably good, but increasing a timeout value makes that job, too. I > don't think it makes any sense on real hardware. > > Or do you mean there is data corruption that is properly fixed with your > patch? > > That is, I agree there can be a situation when a callback makes all the > logical work it should and it just hasn't got enough time to perform a > completion before a timeout on waiter's side occurs. And this behaviour > can be named "racy". But, technically, this seems to be a rather valid > timeout. > >> [1] https://lore.kernel.org/lkml/9d9b9652-c1ac-58e9-2eab-9256c17b1da2@xxxxxxxxxxxxxxxxxxx/ >> > > I don't think it's a similar case because wait_for_completion_state() is > interruptible while wait_for_completion_timeout() is not. Ping, Hillf? -Toke