Re: [PATCH 1/1] rpmsg: virtio_rpmsg_bus - prevent possible race condition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 4 Sept 2023 at 19:33, Tim Blechmann <tim@xxxxxxxxxx> wrote:
>
> >>> when we cannot get a tx buffer (`get_a_tx_buf`) `rpmsg_upref_sleepers`
> >>> enables tx-complete interrupt.
> >>> however if the interrupt is executed after `get_a_tx_buf` and before
> >>> `rpmsg_upref_sleepers` we may mis the tx-complete interrupt and sleep
> >>> for the full 15 seconds.
> >>
> >>
> >> Is there any reason why your co-processor is unable to release the TX RPMSG
> >> buffers for 15 seconds? If not, you should first determine the reason why it is
> >> stalled.
> >
> > Arnaud's concern is valid.  If the remote processor can't consume a buffer
> > within 15 seconds, something is probably wrong.
> >
> > That said, I believe your assesment of the situation is correct.  *If* the TX
> > callback is disabled and there is no buffer available, there is a window of
> > opportunity between calls to get_a_tx_buf() and rpmsg_upref_sleepers() for an
> > interrupt to arrive in function rpmsg_send_offchannel_raw().
>
> the remote processor certainly releases the tx buffer and according to
> my tracing the `vring_interrupt` fires immediately before `rpmsg_send`
> enters the `rpmsg_upref_sleepers`.
>
> after applying this patch we haven't been able to reproduce the 15s
> timeout anymore, whereas before we could easily reproduce it with
> certain workloads.
>
> > 3) This patch gets applied when rc1 comes out so that it has 6 or 7 weeks to
> > soak.  No error are locks are reported due to this patch during that time.
>
> mentioning locks: i was a bit uncertain about a good way to implement
> the retry, since both `rpmsg_upref_sleepers` and `get_a_tx_buf` both
> acquire the same mutex. i briefly considered to add `get_a_tx_buf` into
> `rpmsg_upref_sleepers` to avoid locking the same mutex multiple times,
> though it adds a bit of complexity to the implementation and harms
> readability a bit.

The current implementation looks good to me.

> are there any recommendations on this topic or are (likely
> non-contended) locks not expensive enough to justify the added complexity?
>

Since we are changing how buffers are delivered to applications in a
highly concurrent environment, it is possible for this patch to
introduce unintended side effects, something I want to avoid as much
as possible.

I am not worried about contentions for the vrp->tx_lock.

> thanks,
> tim
>
>
> >
> >>
> >> Regards,
> >> Arnaud
> >>
> >>>
> >>> in this case, so we re-try once before we really start to sleep
> >>>
> >>> Signed-off-by: Tim Blechmann <tim@xxxxxxxxxx>
> >>> ---
> >>>   drivers/rpmsg/virtio_rpmsg_bus.c | 24 +++++++++++++++---------
> >>>   1 file changed, 15 insertions(+), 9 deletions(-)
> >>>
> >>> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
> >>> index 905ac7910c98..2a9d42225e60 100644
> >>> --- a/drivers/rpmsg/virtio_rpmsg_bus.c
> >>> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
> >>> @@ -587,21 +587,27 @@ static int rpmsg_send_offchannel_raw(struct rpmsg_device *rpdev,
> >>>
> >>>     /* no free buffer ? wait for one (but bail after 15 seconds) */
> >>>     while (!msg) {
> >>>             /* enable "tx-complete" interrupts, if not already enabled */
> >>>             rpmsg_upref_sleepers(vrp);
> >>>
> >>> -           /*
> >>> -            * sleep until a free buffer is available or 15 secs elapse.
> >>> -            * the timeout period is not configurable because there's
> >>> -            * little point in asking drivers to specify that.
> >>> -            * if later this happens to be required, it'd be easy to add.
> >>> -            */
> >>> -           err = wait_event_interruptible_timeout(vrp->sendq,
> >>> -                                   (msg = get_a_tx_buf(vrp)),
> >>> -                                   msecs_to_jiffies(15000));
> >>> +           /* make sure to retry to grab tx buffer before we start waiting */
> >>> +           msg = get_a_tx_buf(vrp);
> >>> +           if (msg) {
> >>> +                   err = 0;
> >>> +           } else {
> >>> +                   /*
> >>> +                    * sleep until a free buffer is available or 15 secs elapse.
> >>> +                    * the timeout period is not configurable because there's
> >>> +                    * little point in asking drivers to specify that.
> >>> +                    * if later this happens to be required, it'd be easy to add.
> >>> +                    */
> >>> +                   err = wait_event_interruptible_timeout(vrp->sendq,
> >>> +                                           (msg = get_a_tx_buf(vrp)),
> >>> +                                           msecs_to_jiffies(15000));
> >>> +           }
> >>>
> >>>             /* disable "tx-complete" interrupts if we're the last sleeper */
> >>>             rpmsg_downref_sleepers(vrp);
> >>>
> >>>             /* timeout ? */
> >>>             if (!err) {
> >
>



[Index of Archives]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Photo Sharing]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux