virtio emits a few times: [ 404.XXXXXX] No more buffers in queue and then [ 404.588722] remoteproc remoteproc0: vq index 1 is interrupted [ 404.597249] virtqueue interrupt with no work for 6d53f13aDifficult to understand the scheduling with your trace. Could you enable ftrace to observe it (mailbox interrupts,rpmsg and virtio functions)
does this help (it's a trace that a coworker did a few days ago)? ``` root:.../kernel/tracing# cat trace | head -50 # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 10.250 us | rpmsg_sg_init(); 0) + 42.375 us | } /* rpmsg_send_offchannel_raw */ 0) + 51.125 us | } /* virtio_rpmsg_send */ 0) + 59.625 us | } /* rpmsg_send */ ------------------------------------------ 0) stm32mp-414 => kworker-365 ------------------------------------------ 0) | rpmsg_recv_done() { 0) | rpmsg_recv_single() { 0) 5.542 us | rpmsg_sg_init(); 0) + 36.542 us | } 0) + 48.250 us | } ------------------------------------------ 0) kworker-365 => stm32mp-414 ------------------------------------------ 0) 7.000 us | rpmsg_get_buffer_size(); 0) | rpmsg_send() { 0) | virtio_rpmsg_send() { 0) | rpmsg_send_offchannel_raw() { 0) 5.625 us | rpmsg_sg_init(); 0) + 37.333 us | } 0) + 45.916 us | } 0) + 54.250 us | } ------------------------------------------ 0) stm32mp-414 => kworker-365 ------------------------------------------ 0) | rpmsg_recv_done() { 0) | rpmsg_recv_single() { 0) 6.041 us | rpmsg_sg_init(); 0) + 37.292 us | } 0) + 49.542 us | } ------------------------------------------ 0) kworker-365 => stm32mp-414 ------------------------------------------ 0) 6.500 us | rpmsg_get_buffer_size(); 0) | rpmsg_send() { 0) | virtio_rpmsg_send() { 0) | rpmsg_send_offchannel_raw() { 0) 5.417 us | rpmsg_sg_init(); 0) + 36.583 us | } 0) + 45.167 us | } 0) + 53.583 us | } ```apparently this is where it gets stuck (not sure if it's enough context, though).
If yes what is not clear to me is that wait_event_interruptible_timeout() seems to test the condition (so call get_a_tx_buf()) before entering in sleep[1]. A free TX buffer should be found at this step. [1]https://elixir.bootlin.com/linux/latest/source/include/linux/wait.h#L534hmm, interesting. but that would imply a timing issue where calling get_a_tx_buf twice somehow makes a difference as opposed to calling it only once. would the "interrupt with no work" point to a different cause for the issue that we are seeing?Look like you have not released the buffer on remote side so no buffer in vring used list. Do you use the zero copy (rpmsg_hold_rx_buffer) on remote processor? If yes, please check that you have following fix: https://github.com/OpenAMP/open-amp/commit/2b6a38a9c6890869960ed41d6a1dc9930c079e0d
yes, we use rpmsg_hold_rx_buffer, however the fix from 2b6a38a9c6890869960ed41d6a1dc9930c079e0d apparently did not change the behaviour.
many thanks, tim
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature