On 9/4/23 22:43, Mathieu Poirier wrote: > On Mon, Sep 04, 2023 at 03:52:56PM +0200, Arnaud POULIQUEN wrote: >> Hello Tim, >> >> On 9/4/23 10:36, Tim Blechmann wrote: >>> when we cannot get a tx buffer (`get_a_tx_buf`) `rpmsg_upref_sleepers` >>> enables tx-complete interrupt. >>> however if the interrupt is executed after `get_a_tx_buf` and before >>> `rpmsg_upref_sleepers` we may mis the tx-complete interrupt and sleep >>> for the full 15 seconds. >> >> >> Is there any reason why your co-processor is unable to release the TX RPMSG >> buffers for 15 seconds? If not, you should first determine the reason why it is >> stalled. > > Arnaud's concern is valid. If the remote processor can't consume a buffer > within 15 seconds, something is probably wrong. > > That said, I believe your assesment of the situation is correct. *If* the TX > callback is disabled and there is no buffer available, there is a window of > opportunity between calls to get_a_tx_buf() and rpmsg_upref_sleepers() for an > interrupt to arrive in function rpmsg_send_offchannel_raw(). > > From here three things need to happen: > > 1) You send another version of this patch with a changelong that uses proper > english, i.e capital letters when they are needed and no spelling mistake. > > 2) Arnaud confirms our suspicions. Seems to me that this patch is useless - wait_event_interruptible_timeout() function already seems to test the condition (so call get_a_tx_buf()) before entering in sleep[1]. - ftraces show that vq interrupt is not called during the 15-second period. So it is a normal behavior that the vrp->sendq is never waked-up. Tim needs to analyze the reason why no mailbox interrupt occurs. [1]https://elixir.bootlin.com/linux/latest/source/include/linux/wait.h#L534 > > 3) This patch gets applied when rc1 comes out so that it has 6 or 7 weeks to > soak. No error are locks are reported due to this patch during that time. > >> >> Regards, >> Arnaud >> >>> >>> in this case, so we re-try once before we really start to sleep >>> >>> Signed-off-by: Tim Blechmann <tim@xxxxxxxxxx> >>> --- >>> drivers/rpmsg/virtio_rpmsg_bus.c | 24 +++++++++++++++--------- >>> 1 file changed, 15 insertions(+), 9 deletions(-) >>> >>> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c >>> index 905ac7910c98..2a9d42225e60 100644 >>> --- a/drivers/rpmsg/virtio_rpmsg_bus.c >>> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c >>> @@ -587,21 +587,27 @@ static int rpmsg_send_offchannel_raw(struct rpmsg_device *rpdev, >>> >>> /* no free buffer ? wait for one (but bail after 15 seconds) */ >>> while (!msg) { >>> /* enable "tx-complete" interrupts, if not already enabled */ >>> rpmsg_upref_sleepers(vrp); >>> >>> - /* >>> - * sleep until a free buffer is available or 15 secs elapse. >>> - * the timeout period is not configurable because there's >>> - * little point in asking drivers to specify that. >>> - * if later this happens to be required, it'd be easy to add. >>> - */ >>> - err = wait_event_interruptible_timeout(vrp->sendq, >>> - (msg = get_a_tx_buf(vrp)), >>> - msecs_to_jiffies(15000)); >>> + /* make sure to retry to grab tx buffer before we start waiting */ >>> + msg = get_a_tx_buf(vrp); >>> + if (msg) { >>> + err = 0; >>> + } else { >>> + /* >>> + * sleep until a free buffer is available or 15 secs elapse. >>> + * the timeout period is not configurable because there's >>> + * little point in asking drivers to specify that. >>> + * if later this happens to be required, it'd be easy to add. >>> + */ >>> + err = wait_event_interruptible_timeout(vrp->sendq, >>> + (msg = get_a_tx_buf(vrp)), >>> + msecs_to_jiffies(15000)); >>> + } >>> >>> /* disable "tx-complete" interrupts if we're the last sleeper */ >>> rpmsg_downref_sleepers(vrp); >>> >>> /* timeout ? */ >>> if (!err) {