On 10/26, Can Guo wrote: > On 2020-10-26 14:13, Jaegeuk Kim wrote: > > On 10/26, Can Guo wrote: > > > On 2020-10-24 23:06, Jaegeuk Kim wrote: > > > > From: Jaegeuk Kim <jaegeuk@xxxxxxxxxx> > > > > > > > > When giving a stress test which enables/disables clkgating, we hit > > > > device > > > > timeout sometimes. This patch avoids subtle racy condition to address > > > > it. > > > > > > > > If we use __ufshcd_release(), I've seen that gate_work can be called in > > > > parallel > > > > with ungate_work, which results in UFS timeout when doing hibern8. > > > > Should avoid it. > > > > > > > > > > I don't understand this comment. gate_work and ungate_work are > > > queued on > > > an ordered workqueue and an ordered workqueue executes at most one > > > work item > > > at any given time in the queued order. How can the two run in > > > parallel? > > > > When I hit UFS stuck, I saw this by clkgating tracepoint. > > > > - REQ_CLK_OFF > > - CLKS_OFF > > - REQ_CLK_OFF > > - REQ_CLKS_ON > > .. > > > > I don't see how can you tell that the two works are running in parallel > just from above trace. May I know what is the exact error by "UFS timeout > when doing hibern8"? > > By using __ufshcd_release() here, I do see one potential issue if your test > quickly toggles on/off of clk_gating - disable it, enable it, disable it and > enable it, which will cause that __ufshcd_release() being called twice, > meaning > we queue two gate_works back to back. So can you try below code and let me > know > if it helps or not? I am OK with your current change, but I would like to > understand the problem. Thanks. > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c > index 1791bce..3eee438 100644 > --- a/drivers/scsi/ufs/ufshcd.c > +++ b/drivers/scsi/ufs/ufshcd.c > @@ -2271,6 +2271,8 @@ static void ufshcd_gate_work(struct work_struct *work) > unsigned long flags; > > spin_lock_irqsave(hba->host->host_lock, flags); > + if (hba->clk_gating.state == CLKS_OFF) > + goto rel_lock; > /* > * In case you are here to cancel this work the gating state > * would be marked as REQ_CLKS_ON. In this case save time by This doesn't help. So, I checked this back again, and, like what you said, now suspect __ufshcd_release() which changed state to REQ_CLKS_OFF on CLKS_OFF. With the below change, I can see the issue anymore. Let me send v4. --- drivers/scsi/ufs/ufshcd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index b8f573a02713..cc8d5f0c3fdc 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -1745,7 +1745,8 @@ static void __ufshcd_release(struct ufs_hba *hba) if (hba->clk_gating.active_reqs || hba->clk_gating.is_suspended || hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL || ufshcd_any_tag_in_use(hba) || hba->outstanding_tasks || - hba->active_uic_cmd || hba->uic_async_done) + hba->active_uic_cmd || hba->uic_async_done || + hba->clk_gating.state == CLKS_OFF) return; hba->clk_gating.state = REQ_CLKS_OFF; -- 2.29.0.rc1.297.gfa9743e501-goog > > Regards, > > Can Guo. > > > By using active_req, I don't see any problem. > > > > > > > > Thanks, > > > > > > Can Guo. > > > > > > > Signed-off-by: Jaegeuk Kim <jaegeuk@xxxxxxxxxx> > > > > --- > > > > drivers/scsi/ufs/ufshcd.c | 12 ++++++------ > > > > 1 file changed, 6 insertions(+), 6 deletions(-) > > > > > > > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c > > > > index b8f573a02713..e0b479f9eb8a 100644 > > > > --- a/drivers/scsi/ufs/ufshcd.c > > > > +++ b/drivers/scsi/ufs/ufshcd.c > > > > @@ -1807,19 +1807,19 @@ static ssize_t > > > > ufshcd_clkgate_enable_store(struct device *dev, > > > > return -EINVAL; > > > > > > > > value = !!value; > > > > + > > > > + spin_lock_irqsave(hba->host->host_lock, flags); > > > > if (value == hba->clk_gating.is_enabled) > > > > goto out; > > > > > > > > - if (value) { > > > > - ufshcd_release(hba); > > > > - } else { > > > > - spin_lock_irqsave(hba->host->host_lock, flags); > > > > + if (value) > > > > + hba->clk_gating.active_reqs--; > > > > + else > > > > hba->clk_gating.active_reqs++; > > > > - spin_unlock_irqrestore(hba->host->host_lock, flags); > > > > - } > > > > > > > > hba->clk_gating.is_enabled = value; > > > > out: > > > > + spin_unlock_irqrestore(hba->host->host_lock, flags); > > > > return count; > > > > }