many thanks for your analysis, very interesting.
please find below an extract of your trace with my analysis: stm32mp1_bulk_p-390 [001] ..... 907.241226: rpmsg_send <-rpmsg_intercore_send_buffer.constprop.0 stm32mp1_bulk_p-390 [001] ..... 907.241228: virtio_rpmsg_send <-rpmsg_send stm32mp1_bulk_p-390 [001] ..... 907.241237: virtqueue_enable_cb <-rpmsg_send_offchannel_raw stm32mp1_bulk_p-390 [001] ..... 907.241239: virtqueue_enable_cb_prepare At this point seems that no more TX-buffer <-rpmsg_recv_single kworker/0:4-67 [000] ..... 907.242533: vring_interrupt <-rproc_vq_interrupt kworker/0:4-67 [000] ..... 907.242536: rpmsg_xmit_done Here you receive an interrupt indicating that TX buffer has been released by the remote. that's the expected behavior. <-rpmsg_send_offchannel_raw stm32mp1_bulk_p-390 [000] ..... 984.054941: rpmsg_send <-rpmsg_intercore_send_buffer.constprop.0 stm32mp1_bulk_p-390 [000] ..... 984.054943: virtio_rpmsg_send <-rpmsg_send stm32mp1_bulk_p-390 [000] ..... 984.054956: virtqueue_enable_cb <-rpmsg_send_offchannel_raw stm32mp1_bulk_p-390 [000] ..... 984.054958: virtqueue_enable_cb_prepare <-virtqueue_enable_cb stm32mp1_bulk_p-390 [000] ..... 999.398667: virtqueue_disable_cb <-rpmsg_send_offchannel_raw stm32mp1_bulk_p-390 [000] ..... 999.414840: rpmsg_send <-rpmsg_intercore_send_buffer.constprop.0 stm32mp1_bulk_p-390 [000] ..... 999.414843: virtio_rpmsg_send <-rpmsg_send stm32mp1_bulk_p-390 [000] ..... 999.414855: virtqueue_enable_cb <-rpmsg_send_offchannel_raw stm32mp1_bulk_p-390 [000] ..... 999.414857: virtqueue_enable_cb_prepare Here you have again no more TX buffer. From this point there is no more activity neither in TX nor in RX until the timeout of 15 seconds. If you have a look to rproc_vq_interrupt the last one occurs at 907.242533 As you have no more virtqueue interrupts call for both directions, the issue is probably either in the Cortex-M firmware, which seems to be stalled, or due to a disable of the IRQs in Linux.
afaict we can rule out a complete stall of the cortex-m firmware: if we change the rpmsg_send to a rpmsg_trysend/msleep loop, the trysend will succeed to get a buffer after a few iterations.
or due to a disable of the IRQs in Linux.
do you have some recommendations how we could trace this? many thanks, tim
<-virtqueue_enable_cb stm32mp1_bulk_p-390 [000] ..... 1014.758678: virtqueue_disable_cb <-rpmsg_send_offchannel_raw stm32mp1_bulk_p-390 [000] ..... 1014.774802: rpmsg_send <-rpmsg_intercore_send_buffer.constprop.0 stm32mp1_bulk_p-390 [000] ..... 1014.774804: virtio_rpmsg_send <-rpmsg_send stm32mp1_bulk_p-390 [000] ..... 1014.774815: virtqueue_enable_cb
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature