On Wed, Sep 30, 2015 at 02:50:40AM +0200, Rafael J. Wysocki wrote: > On 9/29/2015 10:51 AM, Daniel Vetter wrote: > >On Tue, Sep 29, 2015 at 01:54:53AM +0200, Rafael J. Wysocki wrote: > >>On 9/28/2015 8:52 AM, Daniel Vetter wrote: > >>>On Wed, Sep 23, 2015 at 10:49:36PM +0200, Rafael J. Wysocki wrote: > >>>>On 9/23/2015 7:17 PM, Daniel Vetter wrote: > >>>>>acpi_target_system_state() seems to be almost the thing we're looking > >>>>>for, except that it's only valid in the suspend callbacks since it > >>>>>gets reset to ACPI_STATE_S0 when resuming. So probably we want > >>>>>something else ... > >>>>Right. > >>>> > >>>>The idea is to add a way for drivers to check if > >>>>(a) suspend is going to enter the BIOS > >>>>(b) resume has been triggered by the BIOS > >>>>and that's really what drivers need to know. > >>>> > >>>>For suspend-to-idle those two will return false and for S3 they'll return > >>>>true. > >>>> > >>>>Would that help? > >>>Not sure that matches exaxtly what we'd need here ... Essentially we need > >>>to know whether we've been in S3/S4 (firmware has been eaten) or in one of > >>>the higher suspend-to-idle/standby states (firmware still alive, don't > >>>disturb it). Additional fun that just crossed my mind is that if the > >>>suspend-to-mem is aborted (some other driver failed) then that function > >>>should _not_ indicate that we've been in S3. So maybe something like > >>So it really looks like the interface I was talking about would be suitable: > >> > >>pm_suspend_via_firmware() == true -> firmware is going to be eaten (use that > >>during suspend if needed) > >>pm_resume_via_firmware() == true -> firmware was eaten > >> > >>The latter will only return 'true' if we really have entered the BIOS > >>(platform firmware). > >Yeah that seems to fit the bill. We already have a check in our suspend > >paths to figure out whether we'll suspend to idle or go into full S3, so > >i915 could use them both. And making them generic would make sense I > >guess. > > OK, sent patches (CCed you), please have a look. EXPORT_SYMBOL seems missing, but lgtm otherwise. I haven't tried to use them for the two cases in i915 though. -Daniel > > Thanks, > Rafael > > > >>>acpi_source_system_state() which usually is S0 and only when acpi > >>>successfully went into the suspend state in platform_suspend_ops->enter it > >>>gets set to the value of acpi_target_system_state. And then reset once the > >>>resume has completed. I think that would be what we'd want here. > >>We need to new functions like the above, because some things already depend > >>on acpi_target_system_state working the way it does currently. > >> > >>I see no reason to make that ACPI-specific, though, in principle. > >> > >>>Anyway I'll pull in Animesh series meanwhile, amended with a FIXME comment. > >>Fine by me. > >> > >>Thanks, > >>Rafael > >> > >> > >>>>>On Wed, Sep 23, 2015 at 6:28 PM, Daniel Vetter <daniel@xxxxxxxx> wrote: > >>>>>>Actually add Rafael this time around ... > >>>>>>-Daniel > >>>>>> > >>>>>>On Wed, Sep 23, 2015 at 6:27 PM, Daniel Vetter <daniel@xxxxxxxx> wrote: > >>>>>>>On Wed, Sep 23, 2015 at 9:57 AM, Daniel Vetter <daniel@xxxxxxxx> wrote: > >>>>>>>>On Thu, Sep 17, 2015 at 12:53:21AM +0530, Animesh Manna wrote: > >>>>>>>>>On 9/14/2015 1:16 PM, Daniel Vetter wrote: > >>>>>>>>>>On Fri, Sep 11, 2015 at 12:36:24AM +0530, Animesh Manna wrote: > >>>>>>>>>>>On 9/10/2015 8:15 PM, Daniel Vetter wrote: > >>>>>>>>>>>>On Thu, Sep 10, 2015 at 01:58:54AM +0530, Animesh Manna wrote: > >>>>>>>>>>>>>On 9/2/2015 2:24 PM, Daniel Vetter wrote: > >>>>>>>>>>>>>>On Wed, Aug 26, 2015 at 07:40:54PM +0530, Animesh Manna wrote: > >>>>>>>>>>>>>>>On 8/26/2015 6:40 PM, Daniel Vetter wrote: > >>>>>>>>>>>>>>>>On Wed, Aug 26, 2015 at 01:36:05AM +0530, Animesh Manna wrote: > >>>>>>>>>>>>>>>>>Dmc will restore the csr program except DC9, cold boot, > >>>>>>>>>>>>>>>>>warm reset, PCI function level reset, and hibernate/suspend. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>intel_csr_load_program() function is used to load the firmware > >>>>>>>>>>>>>>>>>data from kernel memory to csr address space. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>All values of csr address space will be zero if it got reset and > >>>>>>>>>>>>>>>>>the first byte of csr program is always a non-zero if firmware > >>>>>>>>>>>>>>>>>is loaded successfuly. Based on hardware status will load the > >>>>>>>>>>>>>>>>>firmware. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>Without this condition check if we overwrite the firmware data the > >>>>>>>>>>>>>>>>>counters exposed for dc5/dc6 (help for debugging) will be nullified. > >>>>>>>>>>>>>>>Bacause of the above reason mentioned just above we need to block firmware loading again. > >>>>>>>>>>>>>>>So only WARN_ON will not help. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>v1: Initial version. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>v2: Based on review comments from Daniel, > >>>>>>>>>>>>>>>>>- Added a check to know hardware status and load the firmware if not loaded. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>Cc: Daniel Vetter <daniel.vetter@xxxxxxxxx> > >>>>>>>>>>>>>>>>>Cc: Damien Lespiau <damien.lespiau@xxxxxxxxx> > >>>>>>>>>>>>>>>>>Cc: Imre Deak <imre.deak@xxxxxxxxx> > >>>>>>>>>>>>>>>>>Cc: Sunil Kamath <sunil.kamath@xxxxxxxxx> > >>>>>>>>>>>>>>>>>Signed-off-by: Animesh Manna <animesh.manna@xxxxxxxxx> > >>>>>>>>>>>>>>>>>Signed-off-by: Vathsala Nagaraju <vathsala.nagaraju@xxxxxxxxx> > >>>>>>>>>>>>>>>>>--- > >>>>>>>>>>>>>>>>> drivers/gpu/drm/i915/intel_csr.c | 9 +++++++++ > >>>>>>>>>>>>>>>>> 1 file changed, 9 insertions(+) > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>diff --git a/drivers/gpu/drm/i915/intel_csr.c b/drivers/gpu/drm/i915/intel_csr.c > >>>>>>>>>>>>>>>>>index ba1ae03..682cc26 100644 > >>>>>>>>>>>>>>>>>--- a/drivers/gpu/drm/i915/intel_csr.c > >>>>>>>>>>>>>>>>>+++ b/drivers/gpu/drm/i915/intel_csr.c > >>>>>>>>>>>>>>>>>@@ -252,6 +252,15 @@ void intel_csr_load_program(struct drm_device *dev) > >>>>>>>>>>>>>>>>> return; > >>>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>>+ /* > >>>>>>>>>>>>>>>>>+ * Dmc will restore the csr the program except DC9, cold boot, > >>>>>>>>>>>>>>>>>+ * warm reset, PCI function level reset, and hibernate/suspend. > >>>>>>>>>>>>>>>>>+ * This condition will help to check if csr address space is reset/ > >>>>>>>>>>>>>>>>>+ * not loaded. > >>>>>>>>>>>>>>>>>+ */ > >>>>>>>>>>>>>>>>Atm we call this from driver load and resume, which doesn seem to cover > >>>>>>>>>>>>>>>>all the cases you mention in the comment. Should this be a WARN_ON > >>>>>>>>>>>>>>>>instead? Or do we have troubles in our init sequence where we load too > >>>>>>>>>>>>>>>>many times? > >>>>>>>>>>>>>>>Yes, the above statement taken from bspec to describe about the special cases dmc will not restore the firmware. > >>>>>>>>>>>>>>>Agree, In our cases cold boot and hibernate/suspend mainly we need to load the firmware again, so in my > >>>>>>>>>>>>>>>second sentence I wanted to comment mainly regarding this condition check added for suspend-hibernate(reset) > >>>>>>>>>>>>>>>and cold boot(not loaded). > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>Anyways the same api later can be used to load the firmware from anywhere, so my intention to check firmware loaded or not. > >>>>>>>>>>>>>>>If already loaded then not to overwrite the csr address space to maintain the dc5/dc6 counter value. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>Can the below comment more clear to you. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> /* > >>>>>>>>>>>>>>> * Dmc will restore the csr the program except DC9, cold boot, > >>>>>>>>>>>>>>> * warm reset, PCI function level reset, and hibernate/suspend. > >>>>>>>>>>>>>>> * If firmware is restored by dmc then no need to load again which > >>>>>>>>>>>>>>> * will keep the dc5/dc6 counter exposed by firmware. > >>>>>>>>>>>>>>> */ > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>No issue in init sequence. > >>>>>>>>>>>>>>That seems to still cover all the callers of the function afaics - we do > >>>>>>>>>>>>>>pci resets over suspend resume unconditionally. So I still don't > >>>>>>>>>>>>>>understand where exactly we try to load the dmc firmware in i915.ko when > >>>>>>>>>>>>>>it's already loaded. > >>>>>>>>>>>>>During resume intel_csr_load_program() will be called from > >>>>>>>>>>>>>intel_runtime_resume(). > >>>>>>>>>>>>> > >>>>>>>>>>>>>intel_runtime_resume()-> skl_resume_prepare()-> intel_csr_load_program() > >>>>>>>>>>>>> > >>>>>>>>>>>>>During Pc10 entry testing I can see dmc is restoring back the firmware always, > >>>>>>>>>>>>>but as you mentioned pci-reset can happen unconditionally, but still then > >>>>>>>>>>>>>also during resume intel_runtime_resume() will be called and based on > >>>>>>>>>>>>>register read of csr-base-address firmware loading will happen. > >>>>>>>>>>>>But in your comment you're saying it won't get restored in case of dc9 and > >>>>>>>>>>>>suspend. So that seems to mismatch what you're saying here (and what the > >>>>>>>>>>>>commit message says) and what the code does. And this function here is > >>>>>>>>>>>>called for resume after suspend/hibernate only. > >>>>>>>>>>>pc10 entry explanation I told is for skylake. dc9 in skylake is not possible. > >>>>>>>>>>>I think you are confusing between dc6 and dc9. Pc10 can be achieved by > >>>>>>>>>>>entering into dc6 (not dc9) for skylake. dc9 is the lowest possible state > >>>>>>>>>>>for broxton which is not present for skylake. > >>>>>>>>>>I have no idea at all about different pc levels on skl. What I'm talking > >>>>>>>>>>about is system suspend/resume and driver load, which are the places this > >>>>>>>>>>function gets called. At least afaics. > >>>>>>>>>> > >>>>>>>>>>>Here intel_csr_load_program() will be used for both skylake and broxton, and instruction > >>>>>>>>>>>execution flow will be different in case of suspend/resume which I think is confusing > >>>>>>>>>>>you. > >>>>>>>>>>That seems like really important information. What's different on bxt? > >>>>>>>>>>These are the kind of details you should explain in the commit message ... > >>>>>>>>>> > >>>>>>>>>>>I am ready explain you in detail. It will be good if we discuss specific use-case scenario > >>>>>>>>>>>and itz software design for specific platform. Another point - as dmc related code for > >>>>>>>>>>>broxton is not merged better first we close design for skylake. Now, I have added dc9 > >>>>>>>>>>>description in comment thinking of future. If you want I can remove for now and later > >>>>>>>>>>>can add in bxt patch series for enabling dmc. Will wait for your reply. > >>>>>>>>>>This question here isn't about the overall design and how to handle power > >>>>>>>>>>wells in skl/bxt. That's a separate discussion and tracked somewhere else. > >>>>>>>>>>I'm really just confused about when exactly we need to reload to firmware, > >>>>>>>>>>and why we need a runtime check for that. Normally we should know when to > >>>>>>>>>>reload the firmware and just either reload or not, without checking hw > >>>>>>>>>>state. And I don't like checking for hw state since at least in the past > >>>>>>>>>>that kind of code ended up being fragile - it's an illusion that it does > >>>>>>>>>>the right thing no matter what, since often there's other tricky ordering > >>>>>>>>>>constraints. And if you have automatic duct-tape like then no one will > >>>>>>>>>>ever spot those other, harder to spot issues, until an expensive customer > >>>>>>>>>>escalation happens. > >>>>>>>>>> > >>>>>>>>>>So what I want to know here is: > >>>>>>>>>>- When exactly do we need to reload dmc firmware. > >>>>>>>>>In skl, during driver load first time we load the firmware, during normal > >>>>>>>>>suspend-resume (dc6 entry/exit) > >>>>>>>>>no need to reload the firmware again as dmc will take care of it. But during > >>>>>>>>>suspend/hibernation > >>>>>>>>>dmc will not restore the firmware. In that case driver need to reload it > >>>>>>>>>again. I do not know > >>>>>>>>>how to differentiate pm-suspend and suspend-hibernation and thought both the > >>>>>>>>>cases > >>>>>>>>>intel_runtime_resume() will be called where we can check the h/w state and > >>>>>>>>>reload the > >>>>>>>>>firmware if dmc is not restored. > >>>>>>>>> > >>>>>>>>>In bxt, during driver load first time we load the firmware, during normal > >>>>>>>>>suspend-resume > >>>>>>>>>display engine will enter into dc9 and dmc will not restore the firmware. So > >>>>>>>>>every > >>>>>>>>>suspend-resume we need to reload the firmware. > >>>>>>>>>>- What exactly is the reason why we can't make that decision statically in > >>>>>>>>>> the code (by calling csr_load at the right spots). > >>>>>>>>>As I mentioned before in case of skylake can we differentiate between > >>>>>>>>>"resume from pm-suspend" with "resume from suspend-hibernation" inside > >>>>>>>>>driver? > >>>>>>>>> > >>>>>>>>>In case of broxton, every time we need to reload, so we can decide > >>>>>>>>>statically. > >>>>>>>>Of course we can differentiate between all the different resume paths, and > >>>>>>>>we also have a per-platform split to take care of bxt vs. skl. And there > >>>>>>>>are actually 3 different resume paths: > >>>>>>>> > >>>>>>>>- runtime PM resume. This calls the runtime_resume hook. It sounds like on > >>>>>>>> skl we should _not_ load the csr firmware, but on bxt we should load it. > >>>>>>>> This can be fixed by removing the intel_csr_load_program call from > >>>>>>>> skl_resume_prepare. > >>>>>>>>- resume from hibernate-to-disk (i.e. system completely off, state stored > >>>>>>>> on the swap partition) is done by calling the thaw callbacks. > >>>>>>>>- resume from suspend-to-mem (i.e. system in low-power with only memory > >>>>>>>> in self-refresh, all state stored in memory) is done by calling the > >>>>>>>> resume callbacks. > >>>>>>>> > >>>>>>>>For i915 we use unified handlers in our dev_pm_ops for both thaw and > >>>>>>>>resume, but it sounds like that won't be a problem for skl/bxt since we > >>>>>>>>need to reload the csr firmware in all cases. Although I'm not perfectly > >>>>>>>>sure since you don't explain what kind of resume you mean exactly (since > >>>>>>>>you don't use the linux names for them). > >>>>>>>> > >>>>>>>>Anyway it sounds like we can replace this patch by one where we remove > >>>>>>>>that errornous csr load call from skl runtime pm resume and that's all. > >>>>>>>>But I suggest to make sure we get this right we keep the check you're > >>>>>>>>adding here, but wrap it in a WARN_ON. Then we'll get a backtrace when > >>>>>>>>this is going wrong again. Like this: > >>>>>>>> > >>>>>>>> if (WARN_ON(csr_loaded_already())) > >>>>>>>> return; > >>>>>>>> > >>>>>>>>Also when redoing the commits please explain in detail what exactly are > >>>>>>>>the requirements like you've done above, but please use the standard linux > >>>>>>>>names, i.e. "runtime PM" and "hibernate-to-disk" and "suspend-to-mem". > >>>>>>>Ok hooray there's more suspend-to-something things I've totally missed: > >>>>>>>- suspend-to-idle (done by cat freeze > /sys/power/state) and > >>>>>>>- suspend (done by cat suspend > /sys/power/state) > >>>>>>> > >>>>>>>And apparently there's really no way to drivers to tell them apart. > >>>>>>>Rafael, is there really no way for drivers to take different paths for > >>>>>>>these 3 suspend cases? I tried grepping for PM_SUSPEND_ON/STANDY/MEM > >>>>>>>and didn't spot anything. > >>>>>>> > >>>>>>>Also we're completely missing test coverage for that in igt. That is > >>>>>>>something that needs to be fixed asap (yet another case of > >>>>>>>combinatorial explosion in igt tests, yay). And at least one of those > >>>>>>>suspend-to-idle testcase better be in the BAT. > >>>>>>>-Daniel > >>>>>>>-- > >>>>>>>Daniel Vetter > >>>>>>>Software Engineer, Intel Corporation > >>>>>>>+41 (0) 79 365 57 48 - http://blog.ffwll.ch > >>>>>>-- > >>>>>>Daniel Vetter > >>>>>>Software Engineer, Intel Corporation > >>>>>>+41 (0) 79 365 57 48 - http://blog.ffwll.ch > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx