On Mon, Jul 10, 2017 at 11:31:55AM +0200, Maarten Lankhorst wrote: > Op 10-07-17 om 08:43 schreef Daniel Vetter: > > On Fri, Jul 07, 2017 at 06:18:12PM +0300, Ville Syrjälä wrote: > >> On Fri, Jul 07, 2017 at 04:05:28PM +0200, Daniel Vetter wrote: > >>> On Fri, Jul 7, 2017 at 3:21 PM, Ville Syrjälä > >>> <ville.syrjala@xxxxxxxxxxxxxxx> wrote: > >>>> On Fri, Jul 07, 2017 at 02:03:38PM +0200, Daniel Vetter wrote: > >>>>> On Thu, Jul 06, 2017 at 11:24:41PM +0300, ville.syrjala@xxxxxxxxxxxxxxx wrote: > >>>>>> From: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > >>>>>> > >>>>>> For i915 GPU reset handling we'll want to be able to duplicate the state > >>>>>> that was last commited to the hardware. For that purpose let's start to > >>>>>> track the commited state for each object and provide a way to duplicate > >>>>>> the commmited state into a new drm_atomic_state. The locking for > >>>>>> .commited_state must to be provided by the driver. > >>>>>> > >>>>>> drm_atomic_helper_duplicate_commited_state() duplicates the state > >>>>>> to both old_state and new_state. For the purposes of i915 GPU reset we > >>>>>> would only need one of them, but we actually need two top level states; > >>>>>> one for disabling everything (which would need the duplicated state to > >>>>>> be old_state), and another to reenable everything (which would need the > >>>>>> duplicated state to be new_state). So to make it less comples I figured > >>>>>> I'd just always duplicate both. Might want to rethink this if for no > >>>>>> other reason that reducing the chances of memory allocation failure. > >>>>>> Due to the double state duplication we need > >>>>>> drm_atomic_helper_clean_commited_state() to clean up the duplicated > >>>>>> old_state since that's not handled by the normal drm_atomic_state > >>>>>> cleanup code. > >>>>>> > >>>>>> TODO: do we want this in the helper, or maybe it should be just in i915? > >>>>>> > >>>>>> v2: s/commited/committed/ everywhere (checkpatch) > >>>>>> Handle state duplication errors better > >>>>>> v3: Even more care in dealing with memory allocation errors > >>>>>> Handle private objs too > >>>>>> Deal with the potential ordering issues between swap_state() > >>>>>> and hw_done() by keeping track of which state was swapped in > >>>>>> last > >>>>>> > >>>>>> Signed-off-by: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > >>>>> I still don't get why we need to duplicate the committed state for gpu > >>>>> reset. When I said I'm not against adding all that complexity long-term I > >>>>> meant when we actually really need it. Imo g4x gpu reset isn't a good > >>>>> justification for that, reworking the atomic world for that seems > >>>>> massively out of proportion. > >>>> Well, I still don't see what's so "massive" about a couple of extra state > >>>> pointers hanging around. > >>>> > >>>> Also while doing that state duplication stuff, my old idea of > >>>> splitting the crtc disable and enable phases into separate atomic > >>>> commits popped up again in my head. For that being able to duplicate > >>>> arbitrary states would seem like a nice thing to have. So the > >>>> refactoring I had to do can have other uses. > >>> I fully realize that you're unhappy with how atomic ended up getting > >>> merged and that you think it's a grave mistake. And maybe it is, maybe > >>> it isn't, I have no idea. > >> I don't think I ever said that. I've said that it has certain design > >> problems that we ought to fix. This one being one, and another being > >> to separate the user state from the internal state. The latter I think > >> we'll have to tackle rather soon thanks to some new hardware in the > >> pipeline, or we need to come up with some other way to achieve the > >> same effect. > >> > >>> But right now we have _nothing_ asking for > >>> that reorg afaik, and using gen4 reset to justify it is in my opinion > >>> simply not solid engineering practice. Maybe we need this in the > >>> future, and then we can add it, but not before. Refactoring stuff to > >>> prettify the architecture isn't really useful work. > >> Neither is having to throw out code that already exists and works. If > >> you're so worried about time being wasted on pre-g4x GPU reset, then > >> we could just as well merge my code and move on to more productive > >> endeavors. > > I'm worried about future time wasted on this, not current time. > > > >>>>> Why exactly can't we do this simpler? I still don't get that part. Is > >>>>> there really no solution that doesn't break atomic's current assumption > >>>>> that commits are fully ordered on a given crtc? > >>>> From the point of view of the old and new states it doesn't actually > >>>> break that. The commits done from the reset path are essentially > >>>> invisible to the normal modeset operation. > >>> You insert something into a fully ordered queue. That does break the > >>> entire concept and needs a pile of locks and stuff to make it work. > >> Exactly one lock. Well two if you could the spinlock to protect the > >> committed_state pointer update from parallel updates touching the same > >> kms object. That latter one could be removed if atomic wouldn't allow > >> parallel commits to touch the same object. > >> > >>> Yes it's doable, but it's a redesign with all the implications of > >>> subtle breakage all over. > >> What? It doesn't really even do anything unless you do the > >> duplicate_committed state(). Everything else is just assigning pointers. > >> So unless there's some really obvious bug somewhere it can't break > >> anything outside the GPU reset path. And really the only way to break > >> to GPU reset path is to have actual bugs in the normal display commit > >> code. > > It's the gpu reset I'm worried about. There's no point in fixing it if it > > immediately breaks again. > > > >>> While we do have open bugs for the current > >>> design. Rewriting the world to fix a bug needs seriously better > >>> justification imo. > >>> > >>>> The one alternative proposed idea of allowing gem and kms sides go > >>>> out of whack scares me a bit. I think that might land us in more > >>>> trouble when I finally get around to making the video overlay a > >>>> drm_plane. > >>> We've run perfectly fine with this idea for years. > >> Not perfectly. I've had to fix it several times. And I don't think I was > >> the only one. > > The problem is that no one tests against gen4, and everyone forgets that > > it exists. That's why I'd like something with the minimal amount of > > invasiveness, since that would at least be easier to patch up when we > > inevitably break it. Also, something entirely contained to i915 > > conceptually, without imposing more restrictions on shared code. > >>>> And I think trying to keep the GPU reset paths as similar as possible > >>>> between all the platforms would be a nice thing. Just whacking > >>>> everything on the head with a hammer on one platform but not on > >>>> another one seems to me like extra variation in behaviour that we > >>>> don't necessarily want. > >>>> > >>>> But like I said, if someone can come up with a better solution I > >>>> probably wouldn't object too much. It's not going to be coming from me > >>>> though since I have plenty of other things to do and vacation time is > >>>> coming up very soon. So unless someone else comes up with something nice > >>>> soon I think we should just go with my solution because a) it's already > >>>> available, and b) works quite decently from what I can see. > >>> I guess I'll have to retype the old thing in the new world, but it > >>> really shouldn't be more than the quick draft I've laid down in the > >>> old thread. This here is imo no-go with all the core changes, and even > >>> just done within i915 I think it's highly dubious that it provides a > >>> real benefit, since defacto it means we'll have to abandon the atomic > >>> helpers entirely. > >> Now I think you're just being difficult for the sake of it. Have you > >> looked at the code at all? It's fully done from the atomic helpers > >> right now. And even moving the committed state tracking to i915 > >> wouldn't require abandoning the helpers. We could just update the > >> committed state pointers when we call hw_done(), and we'd have to > >> update the state seqno/age timestamp when we call swap_state(). > >> That's all there is to this. > > I'm concerned with the maintainenace burden you impose on all future i915 > > atomic work, that's all. Yes it looks simple to you and right now, but > > it's another little thing to keep working, and we're really good at > > breaking stuff all the time. > > > > But if you strongly think this is the best possible approach overall, > > taking into account long-term impact, then go ahead with implementing this > > in i915. Adding it the concept of a committed state and being able to > > duplicate that and squeeze another commit in to the shared atomic helpers > > doesn't make sense imo. > > -Daniel > > I think the problem is about struct_mutex usage by atomic commit during reset. > GPU reset has to wait for all previous atomic updates to complete, but > cleanup_planes and prepare_plane_fb both require struct_mutex, which can lead > to a deadlock. #99093 The deadlocks I've seen recently didn't necessarily involve struct_mutex IIRC. Just the modeset locks. > > The real fix is not taking struct_mutex during atomic commit, which will mean > no deadlock can happen. > > Is this the bug being fixed here or am I missing something? This would avoid both struct_mutex and modeset locks in the display reset path, so I guess it should help with struct_mutex issues as well. -- Ville Syrjälä Intel OTC _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel