Hans de Goede <hdegoede@xxxxxxxxxx> writes: > Hi, > > On 11-01-18 22:42, Hans de Goede wrote: >> Hi, >> >> On 11-01-18 22:17, Ville Syrjälä wrote: >>> On Thu, Jan 11, 2018 at 08:53:42PM +0000, Chris Wilson wrote: >>>> Quoting Ville Syrjälä (2018-01-11 20:10:45) >>>>> On Wed, Jan 10, 2018 at 12:55:05PM +0000, Chris Wilson wrote: >>>>>> While we talk to the punit over its sideband, we need to prevent the cpu >>>>>> from sleeping in order to prevent a potential machine hang. >>>>>> >>>>>> Note that by itself, it appears that pm_qos_update_request (via >>>>>> intel_idle) doesn't provide a sufficient barrier to ensure that all core >>>>>> are indeed awake (out of Cstate) and that the package is awake. To do so, >>>>>> we need to supplement the pm_qos with a manual ping on_each_cpu. >>>>>> >>>>>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=109051 >>>>>> References: https://bugs.freedesktop.org/show_bug.cgi?id=102657 >>>>>> References: https://bugzilla.kernel.org/show_bug.cgi?id=195255 >>>>>> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> >>>>>> Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> >>>>>> Cc: Hans de Goede <hdegoede@xxxxxxxxxx> >>>>>> --- >>>>>> drivers/gpu/drm/i915/i915_drv.c | 6 ++++ >>>>>> drivers/gpu/drm/i915/i915_drv.h | 1 + >>>>>> drivers/gpu/drm/i915/intel_sideband.c | 61 ++++++++++++++++++++++++----------- >>>>>> 3 files changed, 50 insertions(+), 18 deletions(-) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c >>>>>> index 6c8da9d20c33..d4b90cc0130b 100644 >>>>>> --- a/drivers/gpu/drm/i915/i915_drv.c >>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.c >>>>>> @@ -902,6 +902,9 @@ static int i915_driver_init_early(struct drm_i915_private *dev_priv, >>>>>> spin_lock_init(&dev_priv->uncore.lock); >>>>>> mutex_init(&dev_priv->sb_lock); >>>>>> + pm_qos_add_request(&dev_priv->sb_qos, >>>>>> + PM_QOS_CPU_DMA_LATENCY, PM_QOS_DEFAULT_VALUE); >>>>>> + >>>>>> mutex_init(&dev_priv->modeset_restore_lock); >>>>>> mutex_init(&dev_priv->av_mutex); >>>>>> mutex_init(&dev_priv->wm.wm_mutex); >>>>>> @@ -953,6 +956,9 @@ static void i915_driver_cleanup_early(struct drm_i915_private *dev_priv) >>>>>> intel_irq_fini(dev_priv); >>>>>> i915_workqueues_cleanup(dev_priv); >>>>>> i915_engines_cleanup(dev_priv); >>>>>> + >>>>>> + pm_qos_remove_request(&dev_priv->sb_qos); >>>>>> + mutex_destroy(&dev_priv->sb_lock); >>>>>> } >>>>>> static int i915_mmio_setup(struct drm_i915_private *dev_priv) >>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h >>>>>> index a689396d0ff6..ff3f9effc0bb 100644 >>>>>> --- a/drivers/gpu/drm/i915/i915_drv.h >>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h >>>>>> @@ -1887,6 +1887,7 @@ struct drm_i915_private { >>>>>> /* Sideband mailbox protection */ >>>>>> struct mutex sb_lock; >>>>>> + struct pm_qos_request sb_qos; >>>>>> /** Cached value of IMR to avoid reads in updating the bitfield */ >>>>>> union { >>>>>> diff --git a/drivers/gpu/drm/i915/intel_sideband.c b/drivers/gpu/drm/i915/intel_sideband.c >>>>>> index 75c872bb8cc9..02bdd2e2cef6 100644 >>>>>> --- a/drivers/gpu/drm/i915/intel_sideband.c >>>>>> +++ b/drivers/gpu/drm/i915/intel_sideband.c >>>>>> @@ -22,6 +22,8 @@ >>>>>> * >>>>>> */ >>>>>> +#include <asm/iosf_mbi.h> >>>>>> + >>>>>> #include "i915_drv.h" >>>>>> #include "intel_drv.h" >>>>>> @@ -39,18 +41,20 @@ >>>>>> /* Private register write, double-word addressing, non-posted */ >>>>>> #define SB_CRWRDA_NP 0x07 >>>>>> -static int vlv_sideband_rw(struct drm_i915_private *dev_priv, u32 devfn, >>>>>> - u32 port, u32 opcode, u32 addr, u32 *val) >>>>>> +static void ping(void *info) >>>>>> { >>>>>> - u32 cmd, be = 0xf, bar = 0; >>>>>> - bool is_read = (opcode == SB_MRD_NP || opcode == SB_CRRDDA_NP); >>>>>> +} >>>>>> - cmd = (devfn << IOSF_DEVFN_SHIFT) | (opcode << IOSF_OPCODE_SHIFT) | >>>>>> - (port << IOSF_PORT_SHIFT) | (be << IOSF_BYTE_ENABLES_SHIFT) | >>>>>> - (bar << IOSF_BAR_SHIFT); >>>>>> +static int vlv_sideband_rw(struct drm_i915_private *dev_priv, >>>>>> + u32 devfn, u32 port, u32 opcode, >>>>>> + u32 addr, u32 *val) >>>>>> +{ >>>>>> + const bool is_read = (opcode == SB_MRD_NP || opcode == SB_CRRDDA_NP); >>>>>> + int err; >>>>>> - WARN_ON(!mutex_is_locked(&dev_priv->sb_lock)); >>>>>> + lockdep_assert_held(&dev_priv->sb_lock); >>>>>> + /* Flush the previous comms, just in case it failed last time. */ >>>>>> if (intel_wait_for_register(dev_priv, >>>>>> VLV_IOSF_DOORBELL_REQ, IOSF_SB_BUSY, 0, >>>>>> 5)) { >>>>>> @@ -59,22 +63,43 @@ static int vlv_sideband_rw(struct drm_i915_private *dev_priv, u32 devfn, >>>>>> return -EAGAIN; >>>>>> } >>>>>> - I915_WRITE(VLV_IOSF_ADDR, addr); >>>>>> - I915_WRITE(VLV_IOSF_DATA, is_read ? 0 : *val); >>>>>> - I915_WRITE(VLV_IOSF_DOORBELL_REQ, cmd); >>>>>> + iosf_mbi_punit_acquire(); >>>>>> - if (intel_wait_for_register(dev_priv, >>>>>> - VLV_IOSF_DOORBELL_REQ, IOSF_SB_BUSY, 0, >>>>>> - 5)) { >>>>>> + /* >>>>>> + * Prevent the cpu from sleeping while we use this sideband, otherwise >>>>>> + * the punit may cause a machine hang. >>>>>> + */ >>>>>> + pm_qos_update_request(&dev_priv->sb_qos, 0); >>>>>> + on_each_cpu(ping, NULL, 1); >>>>> >>>>> pm_qos_update_request() doesn't wake up the cpus on its own? I wonder >>>>> kind of latency guarantees it can actually give without doing that. >>>> >>>> intel_idle plugs into the update to wake up idle cpus, but it is not >>>> synchronous. If we don't have the on_each_cpu() here, the hangs reoccur; >>>> the easiest answer seems to be that the on_each_cpu() is causing a >>>> schedule of the RT migration thread which is enough to ensure that the >>>> wakeup has occurred. >>>>> Shouldn't we check if we're actually be talking to the punit and not >>>>> some other unit before we do all this extra work? >>>> >>>> Checking port? I am suspicious of the whole iosf mechanism... >>> >>> I doubt iosf sb itself is busted. That would probably make the entire >>> soc pretty much dead. I suspect the problem is either in the punit >>> firmware, or something fishy is going on with power delivery and >>> reducing the rate at which we change the voltages and whatnot helps >>> keep things a bit more stable. >>> >>> So yeah, I would limit this to punit only by checking the port. >>> >>>> >>>> Also debating whether to only apply it to j1900 or all. Is it just pure >>>> fluke that we can easily reproduce it on the quad core Baytrail as >>>> opposed to the dual core vlv or chv? >>> >>> Not sure. I guess if the problem is around power delivery it might even >>> be board specific to some degree. >> >> The wordpress (on azure.com) link here: >> https://bugzilla.kernel.org/show_bug.cgi?id=109051#c860 >> >> Certainly seems to hint at a power delivery problem (with the SoC not >> with specific boards) and all models reported as needing >> intel_idle.max_cstate=1 there are quad-core Bay Trails. >> >> I guess it the problem happens on all quad-core Bay Trail variants when >> the GPU and CPU both increase there power-demand (by up-clocking, resp. >> leaving C6) at the same time and Chris' fix makes sure all CPU cores >> leave C6 before doing the GPU up-clocking avoiding this. >> >> Note this is all speculation / educated guessing. > > Also interesting wrt this is this comment: > > https://bugzilla.kernel.org/show_bug.cgi?id=109051#c752 > > Which speculates along the same line and the commenter has done > extensive testing (with 2 boards) coming to the conclusion that > the dual core part does not have whatever problem he is seeing. > This has been my understanding that 2 core parts are not affected. Further, limiting cores to 2 with 4 core j1900 using maxcpus=2 is an effective workaround. -Mika _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx