From: Tomer Tayar <ttayar@xxxxxxxxx> When a PCIe AXI drain event happens, it is possible that the driver cannot access the device through PCIe, and therefore cannot send a hard-reset request to FW. Starting from FW version 1.13, FW will initiate a hard-reset in such a case without waiting for a reset request from the driver. Signed-off-by: Tomer Tayar <ttayar@xxxxxxxxx> Reviewed-by: Oded Gabbay <ogabbay@xxxxxxxxxx> Signed-off-by: Oded Gabbay <ogabbay@xxxxxxxxxx> --- drivers/accel/habanalabs/common/habanalabs.h | 8 ++++++++ drivers/accel/habanalabs/gaudi2/gaudi2.c | 2 ++ 2 files changed, 10 insertions(+) diff --git a/drivers/accel/habanalabs/common/habanalabs.h b/drivers/accel/habanalabs/common/habanalabs.h index 1655c101c705..5c69a482b8de 100644 --- a/drivers/accel/habanalabs/common/habanalabs.h +++ b/drivers/accel/habanalabs/common/habanalabs.h @@ -3594,6 +3594,14 @@ static inline bool hl_is_fw_sw_ver_below(struct hl_device *hdev, u32 fw_sw_major return false; } +static inline bool hl_is_fw_sw_ver_equal_or_greater(struct hl_device *hdev, u32 fw_sw_major, + u32 fw_sw_minor) +{ + return (hdev->fw_sw_major_ver > fw_sw_major || + (hdev->fw_sw_major_ver == fw_sw_major && + hdev->fw_sw_minor_ver >= fw_sw_minor)); +} + /* * Kernel module functions that can be accessed by entire module */ diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c index 819660c684cf..b739078c2d87 100644 --- a/drivers/accel/habanalabs/gaudi2/gaudi2.c +++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c @@ -10007,6 +10007,8 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent error_count = gaudi2_handle_pcie_drain(hdev, &eq_entry->pcie_drain_ind_data); reset_flags |= HL_DRV_RESET_FW_FATAL_ERR; event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; + if (hl_is_fw_sw_ver_equal_or_greater(hdev, 1, 13)) + is_critical = true; break; case GAUDI2_EVENT_PSOC59_RPM_ERROR_OR_DRAIN: -- 2.34.1