Quoting Daniel Vetter (2017-10-05 17:12:35) > On Thu, Oct 05, 2017 at 03:30:12PM +0100, Chris Wilson wrote: > > Quoting Daniel Vetter (2017-10-05 15:09:48) > > > stop_machine is not really a locking primitive we should use, except > > > when the hw folks tell us the hw is broken and that's the only way to > > > work around it. > > > > > > This patch here is just a suggestion for how to fix it up, possible > > > changes needed to make it actually work: > > > > > > - Set the nop_submit_request first for _all_ engines, before > > > proceeding. > > > > > > - Make sure engine->cancel_requests copes with the possibility that > > > not all tests have consistently used the new or old version. I dont > > > think this is a problem, since the same can happen really with the > > > stop_machine() locking - stop_machine also doesn't give you any kind > > > of global ordering against other cpu threads, it just makes them > > > stop. > > > > > > This patch tries to address the locking snafu from > > > > There's a locking snafu in the code? > > > > > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5 > > > Author: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > Date: Tue Nov 22 14:41:21 2016 +0000 > > > > > > drm/i915: Stop the machine as we install the wedged submit_request handler > > > > > > Chris said parts of the reasons for going with stop_machine() was that > > > it's no overhead for the fast-path. > > > > More than that, you don't even have to think about it. It's a one off > > event that changes execution paths. I actually never thought about > > putting the lock mechanism around the caller (that does prevent the issue > > I was dreading of being inside the callback as it changed), it is still > > magic that has nothing to do with the code flow. What variable should we > > document as being rcu protected, (*engine->submit_request)()? > > Yeah, engine->submit_request would be the one, except it shouldn't use > rcu_assign_pointer/rcu_dereference since we don't need those barriers, > ever - the .text doesn't change after all. That's why I didn't splatter > the sparse annotations all over it. In a way we don't need the data > barriers of rcu, but really only the "everyone has passed the critical > section now" part of it. > > > I'm definitely not sold on having set-wedge dictate terms to the rest of > > the code. > > Well, it sounds like tglx would very much prefer we clean this locking > inversion up on our side and not get them to break the cycle. Not what I > wanted Thomas to look at, but since I botched it and attached the wrong > splat we got the answer for this patch here, and it seems to be "don't do > that". The analogy is with kernel live-patching (which is what we are doing here). Should every module in the kernel manually markup itself to support the esoteric feature, or should the feature support the wider kernel? -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx