On Tue, Oct 01, 2024 at 04:14:50PM +0530, Kiran K wrote: > The following handshake mechanism needs be followed after firmware > download is completed to bring the firmware to running state. > > After firmware fragments of Operational image are downloaded and > secure sends result of the image succeeds, > > 1. Driver sends HCI Intel reset with boot option #1 to switch FW image. > 2. FW sends Alive GP[0] MSIx > 3. Driver enables data path (doorbell 0x460 for RBDs, etc...) > 4. Driver gets Bootup event from firmware > 5. Driver performs D0 entry to device (WRITE to IPC_Sleep_Control =0x0) > 6. FW sends Alive GP[0] MSIx > 7. Device host interface is fully set for BT protocol stack operation. > 8. Driver may optionally get debug event with ID 0x97 which can be dropped > > For Intermediate loadger image, all the above steps are applicable > expcept #5 and #6. I see this is already applied to bluetooth-next and is probably immutable, but if not... s/loadger/loader/ s/expcept/except/ But more importantly, the race below is still in linux-next as of next-20241111. I mentioned this before at https://lore.kernel.org/all/20241023191352.GA924610@bhelgaas/#t, but it probably got lost in the suspend/resume conversation. > @@ -1053,8 +1272,11 @@ static int btintel_pcie_send_frame(struct hci_dev *hdev, > struct sk_buff *skb) > { > struct btintel_pcie_data *data = hci_get_drvdata(hdev); > + struct hci_command_hdr *cmd; > + __u16 opcode = ~0; > int ret; > u32 type; > + u32 old_ctxt; > > /* Due to the fw limitation, the type header of the packet should be > * 4 bytes unlike 1 byte for UART. In UART, the firmware can read > @@ -1073,6 +1295,8 @@ static int btintel_pcie_send_frame(struct hci_dev *hdev, > switch (hci_skb_pkt_type(skb)) { > case HCI_COMMAND_PKT: > type = BTINTEL_PCIE_HCI_CMD_PKT; > + cmd = (void *)skb->data; > + opcode = le16_to_cpu(cmd->opcode); > if (btintel_test_flag(hdev, INTEL_BOOTLOADER)) { > struct hci_command_hdr *cmd = (void *)skb->data; > __u16 opcode = le16_to_cpu(cmd->opcode); > @@ -1111,6 +1335,30 @@ static int btintel_pcie_send_frame(struct hci_dev *hdev, > bt_dev_err(hdev, "Failed to send frame (%d)", ret); > goto exit_error; > } > + > + if (type == BTINTEL_PCIE_HCI_CMD_PKT && > + (opcode == HCI_OP_RESET || opcode == 0xfc01)) { > + old_ctxt = data->alive_intr_ctxt; > + data->alive_intr_ctxt = > + (opcode == 0xfc01 ? BTINTEL_PCIE_INTEL_HCI_RESET1 : > + BTINTEL_PCIE_HCI_RESET); > + bt_dev_dbg(data->hdev, "sent cmd: 0x%4.4x alive context changed: %s -> %s", > + opcode, btintel_pcie_alivectxt_state2str(old_ctxt), > + btintel_pcie_alivectxt_state2str(data->alive_intr_ctxt)); > + if (opcode == HCI_OP_RESET) { > + data->gp0_received = false; > + ret = wait_event_timeout(data->gp0_wait_q, > + data->gp0_received, > + msecs_to_jiffies(BTINTEL_DEFAULT_INTR_TIMEOUT_MS)); This looks like a race. You're apparently started something previously that will eventually result in data->gp0_received being set. The ordering you expect is this: 1) Tell device to do something and interrupt on completion 2) Set data->gp0_received = false here 3) Call wait_event_timeout() here 4) Completion interrupt causes btintel_pcie_msix_gp0_handler() to be called 5) btintel_pcie_msix_gp0_handler() sets data->gp0_received = true 6) wait_event_timeout() completes But this ordering can also happen: 1) Tell device to do something and interrupt on completion 2) Completion interrupt causes btintel_pcie_msix_gp0_handler() to be called 3) btintel_pcie_msix_gp0_handler() sets data->gp0_received = true 4) Set data->gp0_received = false here, overwriting the "true" 5) Call wait_event_timeout() here 6) wait_event_timeout() times out because completion interrupt has already happened 7) We complain "No alive interrupt received" here You should be able to force this failure by adding a sleep before setting data->gp0_received = false in this patch. > + if (!ret) { > + hdev->stat.err_tx++; > + bt_dev_err(hdev, "No alive interrupt received for %s", > + btintel_pcie_alivectxt_state2str(data->alive_intr_ctxt)); > + ret = -ETIME; > + goto exit_error; > + } > + } > + } > hdev->stat.byte_tx += skb->len; > kfree_skb(skb);