On Tue, Sep 12, 2023 at 02:47:33PM -0700, Keith Busch wrote: > From: Keith Busch <kbusch@xxxxxxxxxx> > > Some devices are reporting controller ready mode support, but return 0 > for CRTO. These devices require a much higher time to ready than that, > so they are failing to initialize after the driver starter preferring > that value over CAP.TO. > > The spec requires that CAP.TO match the appropritate CRTO value, or be > set to 0xff if CRTO is larger than that. This means that CAP.TO can be > used to validate if CRTO is reliable, and provides an appropriate > fallback for setting the timeout value if not. Use whichever is larger. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217863 > Reported-by: Cláudio Sampaio <patola@xxxxxxxxx> > Reported-by: Felix Yan <felixonmars@xxxxxxxxxxxxx> > Based-on-a-patch-by: Felix Yan <felixonmars@xxxxxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Keith Busch <kbusch@xxxxxxxxxx> > --- > drivers/nvme/host/core.c | 48 ++++++++++++++++++++++++---------------- > 1 file changed, 29 insertions(+), 19 deletions(-) > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 37b6fa7466620..4adc0b2f12f1e 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -2245,25 +2245,8 @@ int nvme_enable_ctrl(struct nvme_ctrl *ctrl) > else > ctrl->ctrl_config = NVME_CC_CSS_NVM; > > - if (ctrl->cap & NVME_CAP_CRMS_CRWMS) { > - u32 crto; > - > - ret = ctrl->ops->reg_read32(ctrl, NVME_REG_CRTO, &crto); > - if (ret) { > - dev_err(ctrl->device, "Reading CRTO failed (%d)\n", > - ret); > - return ret; > - } > - > - if (ctrl->cap & NVME_CAP_CRMS_CRIMS) { > - ctrl->ctrl_config |= NVME_CC_CRIME; > - timeout = NVME_CRTO_CRIMT(crto); > - } else { > - timeout = NVME_CRTO_CRWMT(crto); > - } > - } else { > - timeout = NVME_CAP_TIMEOUT(ctrl->cap); > - } > + if (ctrl->cap & NVME_CAP_CRMS_CRWMS && ctrl->cap & NVME_CAP_CRMS_CRIMS) > + ctrl->ctrl_config |= NVME_CC_CRIME; > > ctrl->ctrl_config |= (NVME_CTRL_PAGE_SHIFT - 12) << NVME_CC_MPS_SHIFT; > ctrl->ctrl_config |= NVME_CC_AMS_RR | NVME_CC_SHN_NONE; > @@ -2277,6 +2260,33 @@ int nvme_enable_ctrl(struct nvme_ctrl *ctrl) > if (ret) > return ret; > > + /* CAP value may change after initial CC write */ > + ret = ctrl->ops->reg_read64(ctrl, NVME_REG_CAP, &ctrl->cap); > + if (ret) > + return ret; > + > + timeout = NVME_CAP_TIMEOUT(ctrl->cap); > + if (ctrl->cap & NVME_CAP_CRMS_CRWMS) { > + u32 crto; > + > + ret = ctrl->ops->reg_read32(ctrl, NVME_REG_CRTO, &crto); > + if (ret) { > + dev_err(ctrl->device, "Reading CRTO failed (%d)\n", > + ret); > + return ret; > + } > + > + /* > + * CRTO should always be greater or equal to CAP.TO, but some > + * devices are known to get this wrong. Use the larger of the > + * two values. > + */ > + if (ctrl->ctrl_config & NVME_CC_CRIME) > + timeout = max(timeout, NVME_CRTO_CRIMT(crto)); > + else > + timeout = max(timeout, NVME_CRTO_CRWMT(crto)); I saw the original bug report. But wasn't the problem that these were compared before NVME_CC_CRIME had been written? i.e. is this max() check still needed for the bug reporter's NVMe drive, after NVME_CC_CRIME was been written and CAP has been re-read? (If so, would a quirk be better?) Kind regards, Niklas