[PATCH 10/14] accel/habanalabs: prevent sending heartbeat before events are enabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: farah kassabri <fkassabri@xxxxxxxxx>

After the heartbeat mechanism is now expanded to be used also
for EQ health check, we shouldn't send heartbeat messages
to FW before driver allow events to be received from FW.

Because if the driver will send two heartbeats before it enables
events to be received from FW, then the EQ health check
will fail and reset the device.

Signed-off-by: farah kassabri <fkassabri@xxxxxxxxx>
Reviewed-by: Oded Gabbay <ogabbay@xxxxxxxxxx>
Signed-off-by: Oded Gabbay <ogabbay@xxxxxxxxxx>
---
 drivers/accel/habanalabs/common/device.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/accel/habanalabs/common/device.c b/drivers/accel/habanalabs/common/device.c
index 1d68d2233171..13f14b80a7d4 100644
--- a/drivers/accel/habanalabs/common/device.c
+++ b/drivers/accel/habanalabs/common/device.c
@@ -994,12 +994,7 @@ static void hl_device_eq_heartbeat(struct hl_device *hdev)
 	u64 event_mask = HL_NOTIFIER_EVENT_DEVICE_RESET | HL_NOTIFIER_EVENT_DEVICE_UNAVAILABLE;
 	struct asic_fixed_properties *prop = &hdev->asic_prop;
 
-	 /*
-	  * This feature supported in FW version 1.12.0 45.2.0 and above,
-	  * only on those FW versions eq_health_check_supported will be set.
-	  * Start checking eq health only after driver has enabled events from FW.
-	  */
-	if (!prop->cpucp_info.eq_health_check_supported || !hdev->init_done)
+	if (!prop->cpucp_info.eq_health_check_supported)
 		return;
 
 	if (hdev->eq_heartbeat_received)
@@ -1015,7 +1010,8 @@ static void hl_device_heartbeat(struct work_struct *work)
 	struct hl_info_fw_err_info info = {0};
 	u64 event_mask = HL_NOTIFIER_EVENT_DEVICE_RESET | HL_NOTIFIER_EVENT_DEVICE_UNAVAILABLE;
 
-	if (!hl_device_operational(hdev, NULL))
+	/* Start heartbeat checks only after driver has enabled events from FW */
+	if (!hl_device_operational(hdev, NULL) || !hdev->init_done)
 		goto reschedule;
 
 	/*
-- 
2.34.1




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux