Re: [PATCH v9 35/39] misc/mei/hdcp: Component framework for I915 Interface

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 17, 2018 at 11:57 AM Winkler, Tomas <tomas.winkler@xxxxxxxxx> wrote:
>
>
> > On Sat, Dec 15, 2018 at 09:20:38PM +0000, Winkler, Tomas wrote:
> > > >
> > > > On Thu, Dec 13, 2018 at 5:27 PM Winkler, Tomas
> > > > <tomas.winkler@xxxxxxxxx>
> > > > wrote:
> > > > >
> > > > > > On Thu, Dec 13, 2018 at 1:36 PM C, Ramalingam
> > > > > > <ramalingam.c@xxxxxxxxx>
> > > > > > wrote:
> > > > > > >
> > > > > > > Tomas and Daniel,
> > > > > > >
> > > > > > > We got an issue here.
> > > > > > >
> > > > > > > The relationship that we try to build between I915 and
> > > > > > > mei_hdcp is as
> > > > follows:
> > > > > > >
> > > > > > > We are using the components to establish the relationship.
> > > > > > > I915 is component master where as mei_hdcp is component.
> > > > > > > I915 adds the component master during the module load.
> > > > > > > mei_hdcp adds the
> > > > > > component when the driver->probe is called (on device driver binding).
> > > > > > > I915 forces itself such that until mei_hdcp component is added
> > > > > > > I915_load
> > > > > > wont be complete.
> > > > > > > Similarly on complete system, if mei_hdcp component is
> > > > > > > removed,
> > > > > > immediately I915 unregister itself and HW will be shutdown.
> > > > > > >
> > > > > > > This is completely fine when the modules are loaded and unloaded.
> > > > > > >
> > > > > > > But during suspend, mei device disappears and mei bus handles
> > > > > > > it by
> > > > > > unbinding device and driver by calling driver->remove.
> > > > > > > This in-turn removes the component and triggers the master
> > > > > > > unbind of I915
> > > > > > where, I915 unregister itself.
> > > > > > > This cause the HW state mismatch during the suspend and resume.
> > > > > > >
> > > > > > > Please check the powerwell mismatch errors at CI report for v9
> > > > > > > https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_3412/fi-glk-j4
> > > > > > > 005/
> > > > > > > igt@
> > > > > > > gem_exec_suspend@xxxxxxxxxxxxx
> > > > > > >
> > > > > > > More over unregistering I915 during the suspend is not expected.
> > > > > > > So how do
> > > > > > we handle this?
> > > > > >
> > > > > > Bit more context from our irc discussion with Ram:
> > > > > >
> > > > > > I found this very surprising, since I don't know of any other
> > > > > > subsystems where the devices get outright removed when going
> > > > > > through a
> > > > suspend/resume cycle.
> > > > > > The device model was built to handle this stuff
> > > > > > correctly: First clients/devices/interfaces get suspend, then
> > > > > > the parent/bridge/bus. Same dance in reverse when resuming. This
> > > > > > even holds for lots of hotpluggable buses, where child devices
> > > > > > could indeed disappear on resume, but as long as they don't,
> > > > > > everything stays the same. It's really surprising for something
> > > > > > that's soldered onto the
> > > > board like ME.
> > > > >
> > > > > HDCP is an application in the ME it's not ME itself..  On the
> > > > > linux side HDCP2 is a virtual device  on mei client virtual bus,
> > > > > the bus  is teared
> > > > down on ME reset, which mostly happen  on power transitions.
> > > > > Theoretically,  we could keep it up during power transitions, but
> > > > > so fare it was not necessary and second it's not guarantie that
> > > > > the all ME
> > > > applications will reappear after reset.
> > > >
> > > > When does this happen that an ME application doesn't come back after e.g.
> > > > suspend/resume?
> > > No, this can happen in special flows such as  fw updates and error conditions,
> > but is has to be supported as well.
> > >
> > > >
> > > > Also, what's all the place where this reset can happen? Just
> > > > suspend/resume/hibernate and all these, or also at other times?
> > >
> > > Also on errors and fw update,  the basic assumption is here that it can happen
> > any time.
> >
> > If this can happen any time, what are we supposed to do if this happens while
> > we're doing something with the hdcp mei? If this is such a common occurence I
> > guess we need to somehow wait until everyting is rebound and working again. I
> > think ideally mei core would handle that for us, but I guess if this just randomly
> > happens then we need to redo all the transactions. So does need some
> > involvement of the higher levels.
>
> It's not common occurrence, but the assumption must be it can happen any time,
> In that case everything has to restarted as there is no state preserved in the ME FW.
> Right MEI core cannot do it for you, it is just a channel, the logic and state of the connection
> is in the mei_hdcp or gfx.   Note that HDCP is not the only App over MEI.

Yes, each mei interface would need to provide suspend/resume
functions, or something like that. Or at least a reset function.

> > Also, how likely is it that the hdcp mei will outright disappear and not come
> > back after a reset?
> >
> > > > How does userspace deal with the reset over s/r? I'm assuming that
> > > > at least the device node file will become invalid (or whatever
> > > > you're using as userspace api), so if userspace is accessing stuff
> > > > on the me at the same time as we do a suspend/resume, what happens?
> >
> > Also, answer to how other users handle this would be enlighting.

Still looking to understand this here.

> > > > > > Aside: We'll probably need a device_link to make sure mei_hdcp
> > > > > > is fully resumed before i915 gets resumed, but that's kinda a
> > > > > > detail for later
> > > > on.
> > > > >
> > > > > Frankly I don’t believe there is currently exact abstraction that
> > > > > supports this model, neither components nor device_link .
> > > > > So fare we used class interface for other purposes, it worked well.
> > > >
> > > > I'm not clear on what class interface has to do with component or device
> > link.
> > > > They all solve different problems, at least as far as I understand all this stuff
> > ...
> > > > -Daniel
> > >
> > > It comes instead of it, device_link is mostly used for power
> > > management and component as we see know is not what we need as HDCP Is
> > a b it volitle.
> > > class_interface  gives you two handlers: add and remove device, that's all
> > what is needed for the current implementation.
> >
> > Well someone needs to handle the volatility of hdcp, and atm we seem to be
> > playing a game of pass the bucket. I still think that mei_hdcp should supply a
> > clean interface to i915, with all the reset madness handled internally. But
> > depending upon how badly this all leaks we might need to have a retry logic in
> > the i915 hdcp flow too.
>
>
> Restart logic is must.

Ok, I guess then we need to wrap another layer on top of mei to make
this happen.

Does mei provide any signal whether a client/app has not survived a
reset? Atm there's not way for us to tell a reset apart from a
"mei_hdcp disappared for good" event. Which we kinda need to do.
Ideally a reset would be a distinct event and not implemented as an
unbind/rebind cycle like it currently is.

> > device linke we'll probably need anyway, since i915 resuming when hdcp is not
> > yet up is not a good idea no matter what's goîng on.
>
> I've explored device_link and I'm not sure it is suitable there is no power relationship, on suspend/resume the device disappear.
> I still believe that class_interface is better choice, it this particular case.

I'm not sure what you mean with class_interface here. How are we
supposed to use that in this case here? I'm not following you at all
here.

I also noticed that resume seems to be entirely deferred to workers:
mei_restart only writes the me start command through the hbm. So all
the clients will only be re-registered somewhen later on through an
async worker (in the rescan_work). Is that understanding correct? If
that's the case we'd need a way to wait for that, so we know whether
the mei_hdcp is useable again or has disappeared for good.

> The whole issue is not yet resolved in the Linux kernel.
> There was a discussion around it in ELC  https://schd.ws/hosted_files/osseu18/0f/deferred_problem.pdf

There's still a bunch of open issues around deferred probe and device
driver loading, but none that would interfer with what we're trying to
do here. At least if mei wouldn't handle resets through a bind/unbind
cycle.
-Daniel

>
> Thanks
> Tomas
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux