On Tue, Nov 05, 2024 at 10:26:40AM -0800, Chris Lew wrote: > On 11/4/2024 9:08 PM, Johan Hovold wrote: > > On Mon, Nov 04, 2024 at 04:26:15PM -0800, Chris Lew wrote: > >> This looks like the null pointer would happen if qrtr tried to send > >> before mhi_channel_prepare() is called. > >> I think we have a patch that might fix this, let me dig it up and send > >> it out. > > > > Would that patch still help? > > > > https://lore.kernel.org/lkml/20241104-qrtr_mhi-v1-1-79adf7e3bba5@xxxxxxxxxxx/ > > Yea this is the exact patch I had in mind, didnt realize the patch was > already sent a while back. Heh, that's a bit of an understatement. Apparently the fix was posted three years ago, but no one followed up with a v2: https://lore.kernel.org/lkml/1626831778-31796-1-git-send-email-bbhatt@xxxxxxxxxxxxxx/ > > I naively tried adding a sleep after registering the endpoint, but that > > is at least not sufficient to trigger the NULL-deref. > > Looking at the callstack, this is broadcasting a NEW_SERVER notification > from qrtr_ns. I think you can force this by starting and stopping some > qmi service with the added sleep. Do you have tqftpserv or diag-router > in your environment? Those will open qmi services, so starting and > stopping those will cause the new_server broadcast in qrtr_ns. No, neither tqftpserv or diag-router are used here, but after digging through the code it seems my hunch about this being related to the in-kernel pd-mapper was correct. The qrtr worker, qrtr_ns_worker(), is called when the in-kernel pd-mapper adds the server, and processing the QRTR_TYPE_NEW_SERVER command eventually ends up in mhi_gen_tre() for the modem: [ 9.026694] qcom_pdm_start - adding server [ 9.034684] ctrl_cmd_new_server - service = 0x40, instance = 0x101, node_id = 1, port = 0 [ 9.042155] mhi-pci-generic 0005:01:00.0: mhi_gen_tre - buf_info = ffff800080d4d038, offset_of(buf_info->used) = 34 [ 10.669996] Call trace: [ 10.787734] mhi_gen_tre+0x218/0x270 [mhi] [ 10.804727] mhi_queue+0x74/0x194 [mhi] [ 10.804730] mhi_queue_skb+0x5c/0x8c [mhi] [ 10.804732] qcom_mhi_qrtr_send+0x6c/0x160 [qrtr_mhi] [ 10.804734] qrtr_node_enqueue+0xd0/0x4a0 [qrtr] [ 10.804736] qrtr_bcast_enqueue+0x78/0xe8 [qrtr] [ 10.804737] qrtr_sendmsg+0x15c/0x33c [qrtr] [ 10.804739] sock_sendmsg+0xc0/0xec [ 10.804742] kernel_sendmsg+0x30/0x40 [ 10.804743] service_announce_new+0xbc/0x1c4 [qrtr] [ 10.804745] qrtr_ns_worker+0x754/0x7d4 [qrtr] And I can indeed imagine that leading to the NULL deref in case the endpoint is registered before being fully set up. Johan