The SOC_HW_VERSION register in the BHI space is not correctly initialized by the device and in many cases contains uninitialized data. The register could contain 0xFFFFFFFF which is a special value to indicate a link error in PCIe, therefore if observed, we could incorrectly think the device is down. Intercept reads for this register, and provide the correct value - every production instance would read 0x60110200 if the device was operating as intended. Fixes: a36bf7af868b ("accel/qaic: Add MHI controller") Signed-off-by: Jeffrey Hugo <quic_jhugo@xxxxxxxxxxx> Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@xxxxxxxxxxx> --- drivers/accel/qaic/mhi_controller.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/accel/qaic/mhi_controller.c b/drivers/accel/qaic/mhi_controller.c index 5036e58e7235..1405623b03e4 100644 --- a/drivers/accel/qaic/mhi_controller.c +++ b/drivers/accel/qaic/mhi_controller.c @@ -404,8 +404,21 @@ static struct mhi_controller_config aic100_config = { static int mhi_read_reg(struct mhi_controller *mhi_cntrl, void __iomem *addr, u32 *out) { - u32 tmp = readl_relaxed(addr); + u32 tmp; + /* + * SOC_HW_VERSION quirk + * The SOC_HW_VERSION register (offset 0x224) is not reliable and + * may contain uninitialized values, including 0xFFFFFFFF. This could + * cause a false positive link down error. Instead, intercept any + * reads and provide the correct value of the register. + */ + if (addr - mhi_cntrl->regs == 0x224) { + *out = 0x60110200; + return 0; + } + + tmp = readl_relaxed(addr); if (tmp == U32_MAX) return -EIO; -- 2.34.1