[PATCH 0/6] Enable PC8+ states

daniel at ffwll.ch (Daniel Vetter) · Wed, 12 Jun 2013 18:27:21 +0200

On Wed, Jun 05, 2013 at 02:21:50PM -0300, Paulo Zanoni wrote:
> From: Paulo Zanoni <paulo.r.zanoni at intel.com>
> 
> Hi
> 
> So this series is a second version of the patch I sent on April 16th. Daniel
> asked to write some patches to properly initialize the interrupts we're touching
> on the PC8+ patches, so the first 5 patches should do this.
> 
> The 6th patch is the big one enabling PC8+ and even though I still didn't
> address all of Daniel's concerns from his first review, the patch already works:
> we get into PC10 and when we get back to PC7 things are still working.

Ok, so real review now, not just function name bikeshedding ;-)

* I'd really like to see a testcase, but not per se to exercise pc8+ but
  to check that our prepare/exit code and especially the interrupt
  disabling/enabling doesn't break anything. So the overall structure of
  the test should probably be something like

  1. Put the console into raw mode (so that fbcon doesn't interfere),
  Imre's recent infrastructure patches should make this really simple.

  2. Disable all outputs with setcrtc(fb=NULL).

  3. Wait a bit until we're sure that our code has prepared for pc8+, i.e.
  all the interrupts are shut off, lcpll is disabled ...

  4. Run a subtest.

  5. Repeat.

  For subtests we need quite a pile of them I think:
  - Test getconnectors ioctl to make sure both our ->detec and ->get_modes
    callbacks have the proper aux_display runtime pm get/put calls. On a
    quick read through your patch the get/put (currently still call
    forbid/allow) in the ->get_modes hooks seem to be missing.
  - Direct i2c access from userspace. This happens both from random tools
    probing the i2c bus, but also for real uses-cases, like the ddccontrol
    tool or e.g. the Chromebook Pixie which uses gmbus as the hw i2c
    engine for the touchpad.
  - Submit a patch (this might just work when waking the gt automatically
    wakes up everything) _and_ do a blocking wait for it (this will break
    if interrupts are dead).

  It's probably good to have some correctness checks for these, e.g.
  compare the edid property while outputs are still on (so pc8+ is
  impossible) with what happens when they're off. Execbuf could be tested
  with a simple fill blt. Imo it's also important that we have a wait
  before each new subtest, to make sure we really teste everything. E.g. a
  subtest only checks EDID for one output, then waits again. Otherwise it
  could be that the working refcounting in e.g. HDMI could shadow broken
  runtime pm refcounting for dp aux if it's done right afterwards.

* spin_lock_irqsave isn't that irq-save ;-) It only blocks interrupt on
  the local cpu. See Rusty's unreliable guide to kernel locking in
  Documentation/DocBook/kernel-locking. If you don't understand what
  spin_lock_irqsave is useful for, how our current locking works or why
  the one in your patch is broken I think we should discuss this
  afterwards. Reading my irq locking review patches and comparing it to
  the locking guid could also be interesting.

  I haven't yet done the full review, but probably the safest route is to
  fully disable our interrupt handler with disable_irq while we do the
  pc8+ prepare sequence. On the exit sequence we probably don't need it.
  In any case we can reenable interrupts again with enable_irq. Maybe
  there's a possible race there (I need to review the core irq code
  first), but now I think having a tiny race there where we could miss hpd
  events is less evil than adding complex locking just for pc8+ entry.
  Even more so with the next point, which aims to unify pc8+ entry/exit
  with our existing suspend/resume code. And s/r runs in a non-reentrant
  enviroment, so if we can keep those basic assumptions that's imo
  worthwile.

* Unifying irq setup/teardown code. I'm not really happy with the
  setup/teardown code, and I think we should take a good look at trying to
  unify it with the existing driver load/unload/suspend/resume code.
  Massive register save/restore code tends to get out of sync with other
  code changes way too easily.

  Currently our irq enable sequence is:
  1. call ->irq_preinstall hook
  2. enable irq handler
  3. call ->irq_postinstall hook
  and somewhen later
  4. enable hotplug interrupt handling with our own ->hpd_irq_setup hook.

  The disabling sequence is:
  1. disable irq handler
  2. call ->irq_uninstall hook

  Now for pc8+ we want to disable/enable almost everything _but_ the hpd
  interrupts. I think with some code wrestling we should be able to do
  that with the existing functions:
  - Split the irq_unistall into two parts, irq_unistall which disables all
    interrupt sources _but_ hotplug, and a hpd_irq_teardown which disables
    hotplug, too. For that we'd need an i915_irq_unistall functions which
    first calls drm_irq_uninstall and the new ->hpd_irq_teardown hook.
    That would also allow us to unify the hotplug reenable timer teardown
    a bit.
  - Rework ->irq_postinstall and ->irq_uninstall to not frob hpd
    settings. Since we control hpd interrupts all through SDEIMR it should
    be fairly simple to disable everything but hotplug interrupts, even
    more so with the new helpers I've added in my irq locking review
    series.
  - Important is that the new ->irq_uninstall hook won't disable the
    master interrupt sources, that'd would all be moved into the
    ->hpd_irq_teardown code.

  With the pc8+ prepare sequence would look like
  1. disable irq handler
  2. call ->irq_unistall hook directly
  -> hpd interrupts are still working
  3. re-enable irq handler

  And the exit sequence would be
  1. disable irq handler
  2. call ->irq_postinstall
  3. re-enable irq handler

* As you might have guessed I see the biggest risk in us screwing up the
  interrupt handling and especially someone sneaking in while we're in
  pc8+ mode and changing the irq registers. Safe for the hotplug bits
  which might need masking due to a hotplug storm I expect all such
  changes to be bugs.

  The upshot of the above proposal now is that we can make sure that for
  all the pc8+ relevant interrupt bits calling ->irq_uninstall leaves
  behind the same state as ->irq_preinstall. So we could sprinkle massive
  amounts of WARNs into ->irq_postinstall to check that all the interupt
  registers are still in the correct state and no one changed them.

* Finally (and this part is really just an idea I'll throw out here) our
  driver setup/teardown sequences are getting pretty complicated, and
  different parts (load, s/r, pc8+, ...) will use them ever so slightly. I
  think adding an

  atomic_t dev_priv->driver_stage_ready

  bit mask would be useful. Every time a driver stage is
  loaded/setup/resumed it sets the bit, if we unload/suspend/teardown it
  gets cleared. And every driver setup stage can check at the beginning
  whether all the required pieces are present. This should help us in
  documenting the sometimes tricky ordering constraints in general.

  For this case specifically I'm thinking of 2 driver stages:
  - general_irq_ready
  - hpd_irq_ready
  Since atomic_read is as cheap as any unordered load we could sprinkle a
  few of these checks over the code, e.g. in the gmbus or dp aux irq-based
  wait code. Similar e.g. for gem in the __wait_seqno function. This
  should help a lot in debugging bugs due to missing refcounting.

For priorities I think the best approach is to start with the testcase.
I think currently you're code is missing quite a few runtime pm get/put
calls, so I'm pretty sure it'll blow up. And if it doesn't my
understanding of how pc8+ works is seriously wrong.

And once we have a solid testcase it should be much easier to move the
code piece around a bit so that we have a sane pc8+ enter/exit sequence.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch