Hi,
On 14-03-18 23:38, Lukas Wunner wrote:
On Wed, Mar 14, 2018 at 11:23:12PM +0100, Hans de Goede wrote:
On 14-03-18 23:16, Lukas Wunner wrote:
On Wed, Mar 14, 2018 at 11:06:02PM +0100, Hans de Goede wrote:
This reverts commit 43fff7683468 ("Bluetooth: hci_bcm: Streamline runtime
PM code"). The commit msg for this commit states "No functional change
intended.", but replacing:
pm_runtime_get();
pm_runtime_mark_last_busy();
pm_runtime_put_autosuspend();
with:
pm_request_resume();
Does result in a functional change, pm_request_resume() only calls
pm_runtime_mark_last_busy() if the device was suspended before the call.
Yes, Robert Howell (cc) reported this a few days ago:
https://bugzilla.kernel.org/show_bug.cgi?id=198953
I've worked with him to develop a fix which is better IMHO than a revert,
namely he's replacing the pm_request_resume() in bcm_recv() with
pm_runtime_mark_last_busy(), and the pm_request_resume() in the interrupt
handler can stay. He says that fixes the issue for him.
It makes the race window a lot smaller, but it still leaves a race:
1) some data comes in, gets full read from the device
2) 4.9999 seconds elapse since last byte has been read
3) new data comes in, triggers IRQ, IRQ does nothing because runtime suspend
has not yet kicked in
4) runtime suspend kicks in, disabling the uart before the first new byte is received
5) stuck again
Hm okay, but a call to pm_runtime_mark_last_busy() before the
pm_request_resume() should avoid that. Actually I'm wondering
why we're not calling pm_runtime_mark_last_busy() in rpm_resume()
if the device was already resumed as clearly an action is requested
from it. That needs to be investigated separately.
I hope he'll submit the patch shortly.
We're quite far into the cycle already and this is a serious regression,
also nothing of great value is lost by the revert, the original commit
was a minor cleanup which turns out to have bad side-effects, a simple
revert really is the best solution here, esp. in this point of the cycle.
Just an hour ago he sent me the patch to look over it. And we're at
least two and a half weeks away from v4.16.
No we are *only* two and a half weeks away from v4.16 (worst case scenario)
and Linus does not like getting last minute fixes.
I really so no good reason to not fix this with a simple revert, esp.
since as my explanation of the race condition in the fix he send you
shows, getting this right is non trivial. Falling back to the code before
the troublesome commit gives us a known working solution, at 0 cost (as
the reverted commit was just a code cleanup, no functionality is lost).
Anyways this is Marcel's call now.
Regards,
Hans
--
To unsubscribe from this list: send the line "unsubscribe linux-serial" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html