Actually add Rafael this time around ... -Daniel On Wed, Sep 23, 2015 at 6:27 PM, Daniel Vetter <daniel@xxxxxxxx> wrote: > On Wed, Sep 23, 2015 at 9:57 AM, Daniel Vetter <daniel@xxxxxxxx> wrote: >> On Thu, Sep 17, 2015 at 12:53:21AM +0530, Animesh Manna wrote: >>> >>> >>> On 9/14/2015 1:16 PM, Daniel Vetter wrote: >>> >On Fri, Sep 11, 2015 at 12:36:24AM +0530, Animesh Manna wrote: >>> >> >>> >>On 9/10/2015 8:15 PM, Daniel Vetter wrote: >>> >>>On Thu, Sep 10, 2015 at 01:58:54AM +0530, Animesh Manna wrote: >>> >>>>On 9/2/2015 2:24 PM, Daniel Vetter wrote: >>> >>>>>On Wed, Aug 26, 2015 at 07:40:54PM +0530, Animesh Manna wrote: >>> >>>>>>On 8/26/2015 6:40 PM, Daniel Vetter wrote: >>> >>>>>>>On Wed, Aug 26, 2015 at 01:36:05AM +0530, Animesh Manna wrote: >>> >>>>>>>>Dmc will restore the csr program except DC9, cold boot, >>> >>>>>>>>warm reset, PCI function level reset, and hibernate/suspend. >>> >>>>>>>> >>> >>>>>>>>intel_csr_load_program() function is used to load the firmware >>> >>>>>>>>data from kernel memory to csr address space. >>> >>>>>>>> >>> >>>>>>>>All values of csr address space will be zero if it got reset and >>> >>>>>>>>the first byte of csr program is always a non-zero if firmware >>> >>>>>>>>is loaded successfuly. Based on hardware status will load the >>> >>>>>>>>firmware. >>> >>>>>>>> >>> >>>>>>>>Without this condition check if we overwrite the firmware data the >>> >>>>>>>>counters exposed for dc5/dc6 (help for debugging) will be nullified. >>> >>>>>>Bacause of the above reason mentioned just above we need to block firmware loading again. >>> >>>>>>So only WARN_ON will not help. >>> >>>>>> >>> >>>>>> >>> >>>>>>>>v1: Initial version. >>> >>>>>>>> >>> >>>>>>>>v2: Based on review comments from Daniel, >>> >>>>>>>>- Added a check to know hardware status and load the firmware if not loaded. >>> >>>>>>>> >>> >>>>>>>>Cc: Daniel Vetter <daniel.vetter@xxxxxxxxx> >>> >>>>>>>>Cc: Damien Lespiau <damien.lespiau@xxxxxxxxx> >>> >>>>>>>>Cc: Imre Deak <imre.deak@xxxxxxxxx> >>> >>>>>>>>Cc: Sunil Kamath <sunil.kamath@xxxxxxxxx> >>> >>>>>>>>Signed-off-by: Animesh Manna <animesh.manna@xxxxxxxxx> >>> >>>>>>>>Signed-off-by: Vathsala Nagaraju <vathsala.nagaraju@xxxxxxxxx> >>> >>>>>>>>--- >>> >>>>>>>> drivers/gpu/drm/i915/intel_csr.c | 9 +++++++++ >>> >>>>>>>> 1 file changed, 9 insertions(+) >>> >>>>>>>> >>> >>>>>>>>diff --git a/drivers/gpu/drm/i915/intel_csr.c b/drivers/gpu/drm/i915/intel_csr.c >>> >>>>>>>>index ba1ae03..682cc26 100644 >>> >>>>>>>>--- a/drivers/gpu/drm/i915/intel_csr.c >>> >>>>>>>>+++ b/drivers/gpu/drm/i915/intel_csr.c >>> >>>>>>>>@@ -252,6 +252,15 @@ void intel_csr_load_program(struct drm_device *dev) >>> >>>>>>>> return; >>> >>>>>>>> } >>> >>>>>>>>+ /* >>> >>>>>>>>+ * Dmc will restore the csr the program except DC9, cold boot, >>> >>>>>>>>+ * warm reset, PCI function level reset, and hibernate/suspend. >>> >>>>>>>>+ * This condition will help to check if csr address space is reset/ >>> >>>>>>>>+ * not loaded. >>> >>>>>>>>+ */ >>> >>>>>>>Atm we call this from driver load and resume, which doesn seem to cover >>> >>>>>>>all the cases you mention in the comment. Should this be a WARN_ON >>> >>>>>>>instead? Or do we have troubles in our init sequence where we load too >>> >>>>>>>many times? >>> >>>>>>Yes, the above statement taken from bspec to describe about the special cases dmc will not restore the firmware. >>> >>>>>>Agree, In our cases cold boot and hibernate/suspend mainly we need to load the firmware again, so in my >>> >>>>>>second sentence I wanted to comment mainly regarding this condition check added for suspend-hibernate(reset) >>> >>>>>>and cold boot(not loaded). >>> >>>>>> >>> >>>>>>Anyways the same api later can be used to load the firmware from anywhere, so my intention to check firmware loaded or not. >>> >>>>>>If already loaded then not to overwrite the csr address space to maintain the dc5/dc6 counter value. >>> >>>>>> >>> >>>>>>Can the below comment more clear to you. >>> >>>>>> >>> >>>>>> /* >>> >>>>>> * Dmc will restore the csr the program except DC9, cold boot, >>> >>>>>> * warm reset, PCI function level reset, and hibernate/suspend. >>> >>>>>> * If firmware is restored by dmc then no need to load again which >>> >>>>>> * will keep the dc5/dc6 counter exposed by firmware. >>> >>>>>> */ >>> >>>>>> >>> >>>>>>No issue in init sequence. >>> >>>>>That seems to still cover all the callers of the function afaics - we do >>> >>>>>pci resets over suspend resume unconditionally. So I still don't >>> >>>>>understand where exactly we try to load the dmc firmware in i915.ko when >>> >>>>>it's already loaded. >>> >>>>During resume intel_csr_load_program() will be called from >>> >>>>intel_runtime_resume(). >>> >>>> >>> >>>>intel_runtime_resume()-> skl_resume_prepare()-> intel_csr_load_program() >>> >>>> >>> >>>>During Pc10 entry testing I can see dmc is restoring back the firmware always, >>> >>>>but as you mentioned pci-reset can happen unconditionally, but still then >>> >>>>also during resume intel_runtime_resume() will be called and based on >>> >>>>register read of csr-base-address firmware loading will happen. >>> >>>But in your comment you're saying it won't get restored in case of dc9 and >>> >>>suspend. So that seems to mismatch what you're saying here (and what the >>> >>>commit message says) and what the code does. And this function here is >>> >>>called for resume after suspend/hibernate only. >>> >>pc10 entry explanation I told is for skylake. dc9 in skylake is not possible. >>> >>I think you are confusing between dc6 and dc9. Pc10 can be achieved by >>> >>entering into dc6 (not dc9) for skylake. dc9 is the lowest possible state >>> >>for broxton which is not present for skylake. >>> >I have no idea at all about different pc levels on skl. What I'm talking >>> >about is system suspend/resume and driver load, which are the places this >>> >function gets called. At least afaics. >>> > >>> >>Here intel_csr_load_program() will be used for both skylake and broxton, and instruction >>> >>execution flow will be different in case of suspend/resume which I think is confusing >>> >>you. >>> >That seems like really important information. What's different on bxt? >>> >These are the kind of details you should explain in the commit message ... >>> > >>> >>I am ready explain you in detail. It will be good if we discuss specific use-case scenario >>> >>and itz software design for specific platform. Another point - as dmc related code for >>> >>broxton is not merged better first we close design for skylake. Now, I have added dc9 >>> >>description in comment thinking of future. If you want I can remove for now and later >>> >>can add in bxt patch series for enabling dmc. Will wait for your reply. >>> >This question here isn't about the overall design and how to handle power >>> >wells in skl/bxt. That's a separate discussion and tracked somewhere else. >>> >I'm really just confused about when exactly we need to reload to firmware, >>> >and why we need a runtime check for that. Normally we should know when to >>> >reload the firmware and just either reload or not, without checking hw >>> >state. And I don't like checking for hw state since at least in the past >>> >that kind of code ended up being fragile - it's an illusion that it does >>> >the right thing no matter what, since often there's other tricky ordering >>> >constraints. And if you have automatic duct-tape like then no one will >>> >ever spot those other, harder to spot issues, until an expensive customer >>> >escalation happens. >>> > >>> >So what I want to know here is: >>> >- When exactly do we need to reload dmc firmware. >>> In skl, during driver load first time we load the firmware, during normal >>> suspend-resume (dc6 entry/exit) >>> no need to reload the firmware again as dmc will take care of it. But during >>> suspend/hibernation >>> dmc will not restore the firmware. In that case driver need to reload it >>> again. I do not know >>> how to differentiate pm-suspend and suspend-hibernation and thought both the >>> cases >>> intel_runtime_resume() will be called where we can check the h/w state and >>> reload the >>> firmware if dmc is not restored. >>> >>> In bxt, during driver load first time we load the firmware, during normal >>> suspend-resume >>> display engine will enter into dc9 and dmc will not restore the firmware. So >>> every >>> suspend-resume we need to reload the firmware. >>> >- What exactly is the reason why we can't make that decision statically in >>> > the code (by calling csr_load at the right spots). >>> As I mentioned before in case of skylake can we differentiate between >>> "resume from pm-suspend" with "resume from suspend-hibernation" inside >>> driver? >>> >>> In case of broxton, every time we need to reload, so we can decide >>> statically. >> >> Of course we can differentiate between all the different resume paths, and >> we also have a per-platform split to take care of bxt vs. skl. And there >> are actually 3 different resume paths: >> >> - runtime PM resume. This calls the runtime_resume hook. It sounds like on >> skl we should _not_ load the csr firmware, but on bxt we should load it. >> This can be fixed by removing the intel_csr_load_program call from >> skl_resume_prepare. >> - resume from hibernate-to-disk (i.e. system completely off, state stored >> on the swap partition) is done by calling the thaw callbacks. >> - resume from suspend-to-mem (i.e. system in low-power with only memory >> in self-refresh, all state stored in memory) is done by calling the >> resume callbacks. >> >> For i915 we use unified handlers in our dev_pm_ops for both thaw and >> resume, but it sounds like that won't be a problem for skl/bxt since we >> need to reload the csr firmware in all cases. Although I'm not perfectly >> sure since you don't explain what kind of resume you mean exactly (since >> you don't use the linux names for them). >> >> Anyway it sounds like we can replace this patch by one where we remove >> that errornous csr load call from skl runtime pm resume and that's all. >> But I suggest to make sure we get this right we keep the check you're >> adding here, but wrap it in a WARN_ON. Then we'll get a backtrace when >> this is going wrong again. Like this: >> >> if (WARN_ON(csr_loaded_already())) >> return; >> >> Also when redoing the commits please explain in detail what exactly are >> the requirements like you've done above, but please use the standard linux >> names, i.e. "runtime PM" and "hibernate-to-disk" and "suspend-to-mem". > > Ok hooray there's more suspend-to-something things I've totally missed: > - suspend-to-idle (done by cat freeze > /sys/power/state) and > - suspend (done by cat suspend > /sys/power/state) > > And apparently there's really no way to drivers to tell them apart. > Rafael, is there really no way for drivers to take different paths for > these 3 suspend cases? I tried grepping for PM_SUSPEND_ON/STANDY/MEM > and didn't spot anything. > > Also we're completely missing test coverage for that in igt. That is > something that needs to be fixed asap (yet another case of > combinatorial explosion in igt tests, yay). And at least one of those > suspend-to-idle testcase better be in the BAT. > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx