Hi Can, On Wed, 2020-07-29 at 18:53 +0800, Can Guo wrote: > Hi Stanley, > > On 2020-07-29 18:26, Stanley Chu wrote: > > Hi Can, > > > > On Wed, 2020-07-29 at 16:43 +0800, Can Guo wrote: > >> Hi Stanley, > >> > >> On 2020-07-29 10:40, Stanley Chu wrote: > >> > In ufshcd_suspend(), after clk-gating is suspended and link is set > >> > as Hibern8 state, ufshcd_hold() is still possibly invoked before > >> > ufshcd_suspend() returns. For example, MediaTek's suspend vops may > >> > issue UIC commands which would call ufshcd_hold() during the command > >> > issuing flow. > >> > > >> > Now if UFSHCD_CAP_HIBERN8_WITH_CLK_GATING capability is enabled, > >> > then ufshcd_hold() may enter infinite loops because there is no > >> > clk-ungating work scheduled or pending. In this case, ufshcd_hold() > >> > shall just bypass, and keep the link as Hibern8 state. > >> > > >> > >> The infinite loop is expected as ufshcd_hold is called again after > >> link is put to hibern8 state, so in QCOM's code, we never do this. > > > > Sadly MediaTek have to do this to make our UniPro to enter low-power > > mode. > > > >> The cap UFSHCD_CAP_HIBERN8_WITH_CLK_GATING means UIC link state > >> must not be HIBERN8 after ufshcd_hold(async=false) returns. > > > > If driver is not in PM scenarios, e.g., suspended, above statement > > shall > > be always followed. But two obvious violations are existed, > > > > 1. In ufshcd_suspend(), link is set as HIBERN8 behind ufshcd_hold() > > 2. In ufshcd_resume(), link is set back as Active before > > ufshcd_release() is invoked > > > > So as my understanding, special conditions are allowed in PM scenarios, > > and this is why "hba->clk_gating.is_suspended" is introduced. By this > > thought, I used "hba->clk_gating.is_suspended" in this patch as the > > mandatory condition to allow ufshcd_hold() usage in vendor suspend and > > resume callbacks. > > > > > >> Instead of bailing out from that loop, which makes the logic of > >> ufshcd_hold and clk gating even more complex, how about removing > >> ufshcd_hold/release from ufshcd_send_uic_cmd()? I think they are > >> redundant and we should never send DME cmds if clocks/powers are > >> not ready. I mean callers should make sure they are ready to send > >> DME cmds (and only callers know when), but not leave that job to > >> ufshcd_send_uic_cmd(). It is convenient to remove ufshcd_hold/ > >> release from ufshcd_send_uic_cmd() as there are not many places > >> sending DME cmds without holding the clocks, ufs_bsg.c is one. > >> And I have tested my idea on my setup, it worked well for me. > >> Another benefit is that it also allows us to use DME cmds > >> in clk gating/ungating contexts if we need to in the future. > >> > > > > Brilliant idea! But this may not solve problems if vendor callbacks > > need > > more than UIC commands in the future. > > > > This simple patch could make all vendor operations on UFSHCI in PM > > callbacks possible with UFSHCD_CAP_HIBERN8_WITH_CLK_GATING enabled, and > > again, it allows those operations in PM scenarios only. > > > > Other than UIC cmds, I can only think of device manangement cmds (like > query). > If device management cmds come into the way in the future, we fix it as > well. > I mean that is the right thing to do in my opinion - just like we don't > call > pm_runtime_get_sync() in ufshcd_send_uic_cmd(). > > I can understand that you want a simple/quick fix to get it work for you > once > for all, but from my point of view, debugging clk gating/ungating really > takes > huge efforts sometime (I've spent a lot of time on it). Some flash > vendors also > use it in their own driver widely which makes some failure scenes even > harder to > undertand/debug. So the first thing comes to my head is that we should > avoid > making it more complex or giving it more exceptions. > > From functionality point of view, it looks ok to me. It is just that I > cannot > predict it won't cause new problems since the clk gating/ungating > sequeces are > like magic in some use cases sometime. Thanks for the functionality review. I totally understand what you mentioned above about the clk-gating debugging because we also spent lots of time for issue analysis. I just finished some fault injection for this patch in our platform, the results are fine. The active window of this patch is limited: Starting from ufshcd_link_state_transition() in ufshcd_suspend to ufshcd_vops_resume() in ufshcd_resume() because the link is back to LINKUP state in MediaTek resume callback. So I was focus on injecting errors in our callbacks between this period and most of injected fails triggered host and device reset flow. For example, Suspend: UniPro PowerDownControl timeout Resume: hba_enable timeout Resume: UniPro PowerDownControl timeout Resume: HIBERN8 Exit timeout Hope these tests can ease your concerns. Thanks, Stanley Chu > > Thanks, > > Can Guo. > > >> Please let me know your idea, thanks. > >> > >> Can Guo. > > > > Thanks, > > Stanley Chu > > > >> > >> > Signed-off-by: Stanley Chu <stanley.chu@xxxxxxxxxxxx> > >> > Signed-off-by: Andy Teng <andy.teng@xxxxxxxxxxxx> > >> > > >> > --- > >> > > >> > Changes since v1: > >> > - Fix return value: Use unique bool variable to get the result of > >> > flush_work(). Thcan prevent incorrect returned value, i.e., rc, if > >> > flush_work() returns true > >> > - Fix commit message > >> > > >> > --- > >> > drivers/scsi/ufs/ufshcd.c | 5 ++++- > >> > 1 file changed, 4 insertions(+), 1 deletion(-) > >> > > >> > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c > >> > index 577cc0d7487f..acba2271c5d3 100644 > >> > --- a/drivers/scsi/ufs/ufshcd.c > >> > +++ b/drivers/scsi/ufs/ufshcd.c > >> > @@ -1561,6 +1561,7 @@ static void ufshcd_ungate_work(struct work_struct > >> > *work) > >> > int ufshcd_hold(struct ufs_hba *hba, bool async) > >> > { > >> > int rc = 0; > >> > + bool flush_result; > >> > unsigned long flags; > >> > > >> > if (!ufshcd_is_clkgating_allowed(hba)) > >> > @@ -1592,7 +1593,9 @@ int ufshcd_hold(struct ufs_hba *hba, bool async) > >> > break; > >> > } > >> > spin_unlock_irqrestore(hba->host->host_lock, flags); > >> > - flush_work(&hba->clk_gating.ungate_work); > >> > + flush_result = flush_work(&hba->clk_gating.ungate_work); > >> > + if (hba->clk_gating.is_suspended && !flush_result) > >> > + goto out; > >> > spin_lock_irqsave(hba->host->host_lock, flags); > >> > goto start; > >> > }