On Tue, Oct 10, 2017 at 10:21:45AM +0100, Chris Wilson wrote: > Quoting Daniel Vetter (2017-10-09 17:44:01) > > stop_machine is not really a locking primitive we should use, except > > when the hw folks tell us the hw is broken and that's the only way to > > work around it. > > > > This patch tries to address the locking abuse of stop_machine() from > > > > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5 > > Author: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Date: Tue Nov 22 14:41:21 2016 +0000 > > > > drm/i915: Stop the machine as we install the wedged submit_request handler > > > > Chris said parts of the reasons for going with stop_machine() was that > > it's no overhead for the fast-path. But these callbacks use irqsave > > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast. > > > > To stay as close as possible to the stop_machine semantics we first > > update all the submit function pointers to the nop handler, then call > > synchronize_rcu() to make sure no new requests can be submitted. This > > should give us exactly the huge barrier we want. > > > > I pondered whether we should annotate engine->submit_request as __rcu > > and use rcu_assign_pointer and rcu_dereference on it. But the reason > > behind those is to make sure the compiler/cpu barriers are there for > > when you have an actual data structure you point at, to make sure all > > the writes are seen correctly on the read side. But we just have a > > function pointer, and .text isn't changed, so no need for these > > barriers and hence no need for annotations. > > > > Unfortunately there's a complication with the call to > > intel_engine_init_global_seqno: > > This is still broken in the same way as nop_submit_request may execute > while you sleep, breaking cancel_requests. I guess then I didn't understand which race you mean, since I think the one I've found should be fixed now. Can you pls explain in more detail what exactly is racing against what else? >From what I can tell legacy cancel_request is entirely under the spinlock, so hard to race, same for lrc. And with the global seqno update untangled, they shouldn't complete too early anymore, which I thought was the race you pointed out on the previous thread. I did reply to that to check my understanding, but didn't get a reply. Thanks, Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx