On Fri, Oct 06, 2017 at 12:12:41PM +0200, Thomas Gleixner wrote: > On Fri, 6 Oct 2017, Chris Wilson wrote: > > Quoting Daniel Vetter (2017-10-06 10:06:37) > > > stop_machine is not really a locking primitive we should use, except > > > when the hw folks tell us the hw is broken and that's the only way to > > > work around it. > > > > > > This patch tries to address the locking abuse of stop_machine() from > > > > > > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5 > > > Author: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > Date: Tue Nov 22 14:41:21 2016 +0000 > > > > > > drm/i915: Stop the machine as we install the wedged submit_request handler > > > > > > Chris said parts of the reasons for going with stop_machine() was that > > > it's no overhead for the fast-path. But these callbacks use irqsave > > > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast. > > > > I still want a discussion of the reason why keeping the normal path clean > > and why an alternative is sought, here. That design leads into vv > > stop_machine() is the least resort when serialization problems cannot be > solved otherwise. We try to avoid it where ever we can. While on the call > site it looks simple, it's invasive in terms of locking as shown by the > lockdep splat and it's imposing latencies and other side effects on all > CPUs in the system. So if you don't have a compelling technical reason to > use it, then it _is_ the wrong tool. > > As Daniel has shown it's not required, so there is no technical reason why > stomp_machine() has to be used here. Well I'm not sure yet whether my fix is actually correct :-) But imo there's a bunch more reason why stop_machine is uncool, beyond just the "it's a huge shotgun which doesn't play well with anything else" aspect: - What we actually seem to want is to make sure that all the engine->submit_request have completed, which happen to all run in hardirq context. It's an artifact of stop_machine that it completes all hardirq handlers, but afaiui stop_machine is really just aimed at getting all cpus to execute a specific well know loop (so that your callback can start patching .text and other evil stuff). If we move our callback into a thread that gets preempted, we have a problem. - As a consequence, no lockdep annotations for the locking we actually want. And since this is for gpu hang recovery (something relatively rare that just _has_ to work) we really need all the support from all the debug tools we can get to catch possible issues. - Another consequence is that the read side critical sections aren't annotated in the code. That makes it ever so more likely that a redesign moves them out of hardirq context and breaks it all. - Not relevant here (I think), but stop_machine doesn't remove the need for read-side (compiler) barriers. Not relevant here I think, but in other cases we might still need to sprinkle READ_ONCE all over to make sure gcc doesn't realod and create races that way. rcu has all these bits covered, is maintained by very smart people, and the overhead is somewhere between 0 and a cacheline access that we touch anyway (preempt_count is also wrangled by our spinlocks in all the callbacks). No way this will ever show up against all the mmio writes the callback does anyway. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx