On 12/06/2019 19:25, Bjorn Andersson wrote: > On Wed 12 Jun 09:24 PDT 2019, Marc Gonzalez wrote: > >> On 05/06/2019 01:24, Bjorn Andersson wrote: >> >>> After issuing a PHY_START request to the QMP, the hardware documentation >>> states that the software should wait for the PCS_READY_STATUS to become 1. >>> >>> With the introduction of c9b589791fc1 ("phy: qcom: Utilize UFS reset >>> controller") an additional 1ms delay was introduced between the start >>> request and the check of the status bit. This greatly increases the >>> chances for the hardware to actually becoming ready before the status >>> bit is read. >>> >>> The result can be seen in that UFS PHY enabling is now reported as a >>> failure in 10% of the boots on SDM845, which is a clear regression from >>> the previous rare/occasional failure. >>> >>> This patch fixes the "break condition" of the poll to check for the >>> correct state of the status bit. >>> >>> Unfortunately PCIe on 8996 and 8998 does not specify the mask_pcs_ready >>> register, which means that the code checks a bit that's always 0. So the >>> patch also fixes these, in order to not regress these targets. >>> >>> Cc: stable@xxxxxxxxxxxxxxx >>> Cc: Evan Green <evgreen@xxxxxxxxxxxx> >>> Cc: Marc Gonzalez <marc.w.gonzalez@xxxxxxx> >>> Cc: Vivek Gautam <vivek.gautam@xxxxxxxxxxxxxx> >>> Fixes: 73d7ec899bd8 ("phy: qcom-qmp: Add msm8998 PCIe QMP PHY support") >>> Fixes: e78f3d15e115 ("phy: qcom-qmp: new qmp phy driver for qcom-chipsets") >>> Signed-off-by: Bjorn Andersson <bjorn.andersson@xxxxxxxxxx> >>> --- >>> >>> @Kishon, this is a regression spotted in v5.2-rc1, so please consider applying >>> this towards v5.2. >>> >>> drivers/phy/qualcomm/phy-qcom-qmp.c | 4 +++- >>> 1 file changed, 3 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c b/drivers/phy/qualcomm/phy-qcom-qmp.c >>> index cd91b4179b10..43abdfd0deed 100644 >>> --- a/drivers/phy/qualcomm/phy-qcom-qmp.c >>> +++ b/drivers/phy/qualcomm/phy-qcom-qmp.c >>> @@ -1074,6 +1074,7 @@ static const struct qmp_phy_cfg msm8996_pciephy_cfg = { >>> >>> .start_ctrl = PCS_START | PLL_READY_GATE_EN, >>> .pwrdn_ctrl = SW_PWRDN | REFCLK_DRV_DSBL, >>> + .mask_pcs_ready = PHYSTATUS, >>> .mask_com_pcs_ready = PCS_READY, >>> >>> .has_phy_com_ctrl = true, >>> @@ -1253,6 +1254,7 @@ static const struct qmp_phy_cfg msm8998_pciephy_cfg = { >>> >>> .start_ctrl = SERDES_START | PCS_START, >>> .pwrdn_ctrl = SW_PWRDN | REFCLK_DRV_DSBL, >>> + .mask_pcs_ready = PHYSTATUS, >>> .mask_com_pcs_ready = PCS_READY, >>> }; >>> >>> @@ -1547,7 +1549,7 @@ static int qcom_qmp_phy_enable(struct phy *phy) >>> status = pcs + cfg->regs[QPHY_PCS_READY_STATUS]; >>> mask = cfg->mask_pcs_ready; >>> >>> - ret = readl_poll_timeout(status, val, !(val & mask), 1, >>> + ret = readl_poll_timeout(status, val, val & mask, 1, >>> PHY_INIT_COMPLETE_TIMEOUT); >>> if (ret) { >>> dev_err(qmp->dev, "phy initialization timed-out\n"); >> >> Your patch made me realize that: >> msm8998_pciephy_cfg.has_phy_com_ctrl = false >> thus >> msm8998_pciephy_cfg.mask_com_pcs_ready is useless, AFAICT. > > While 8998 has a COM block, it does (among other things) not have a > ready bit. So afaict has_phy_com_ctrl = false is correct. Pfff... Working blind without the HPG sucks... > The addition of mask_pcs_ready is part of resolving the regression in > 5.2, so I suggest that we remove mask_com_pcs_ready separately. I agree that it should be done separately. I'll send a patch on top of yours. >> (I copied msm8996_pciephy_cfg for msm8998_pciephy_cfg) >> >> Does msm8996_pciephy_cfg really need both mask_pcs_ready AND >> mask_com_pcs_ready? > > 8996 has a COM block and it contains both the control bits and the > status bits, so that looks correct. Thanks for checking. >> I'll test your patch tomorrow. > > I appreciate that. Here are my observations for a 8998 board: 1) If I apply only the readl_poll_timeout() fix (not the mask_pcs_ready fixup) qcom_pcie_probe() fails with a timeout in phy_init. => this is in line with your regression analysis. 2) Your patch also fixes a long-standing bug in UFS init whereby sending lots of information to the console during phy init would lead to an incorrectly diagnosed time-out. Good stuff! Reviewed-by: Marc Gonzalez <marc.w.gonzalez@xxxxxxx> Tested-by: Marc Gonzalez <marc.w.gonzalez@xxxxxxx> Regards.