> -----Original Message----- > From: Intel-gfx [mailto:intel-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx] On Behalf > Of Bloomfield, Jon > Sent: Wednesday, December 6, 2017 9:01 AM > To: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>; intel-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> > Subject: Re: [PATCH v2] drm/i915: Prevent machine hang from > Broxton's vtd w/a and error capture > > > -----Original Message----- > > From: Chris Wilson [mailto:chris@xxxxxxxxxxxxxxxxxx] > > Sent: Wednesday, December 6, 2017 7:38 AM > > To: intel-gfx@xxxxxxxxxxxxxxxxxxxxx > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>; Bloomfield, Jon > > <jon.bloomfield@xxxxxxxxx>; Harrison, John C > <john.c.harrison@xxxxxxxxx>; > > Ursulin, Tvrtko <tvrtko.ursulin@xxxxxxxxx>; Joonas Lahtinen > > <joonas.lahtinen@xxxxxxxxxxxxxxx>; Daniel Vetter <daniel.vetter@xxxxxxxx> > > Subject: [PATCH v2] drm/i915: Prevent machine hang from Broxton's vtd > w/a > > and error capture > > > > Since capturing the error state requires fiddling around with the GGTT > > to read arbitrary buffers and is itself run under stop_machine(), it > > deadlocks the machine (effectively a hard hang) when run in conjunction > > with Broxton's VTd workaround to serialize GGTT access. > > > > v2: Store the ERR_PTR in first_error so that the error can be reported > > to the user via sysfs. > > > > Fixes: 0ef34ad6222a ("drm/i915: Serialize GTT/Aperture accesses on BXT") > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Jon Bloomfield <jon.bloomfield@xxxxxxxxx> > > Cc: John Harrison <john.C.Harrison@xxxxxxxxx> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> > > It's a real shame to lose error capture on BXT. Can we wrap stop_machine to > make it recursive ? > > Something like... > > static cpumask_t sm_mask; > > struct sm_args { > cpu_stop_fn_t *fn; > void *data; > }; > > void do_recursive_stop(void *sm_arg_data) > { > struct sm_arg *args = sm_arg_data; > > /* We're stopped - flag the fact to prevent recursion */ > cpumask_set_cpu(smp_processor_id(), &sm_mask); > > args->fn(args->data); > > /* Re-enable recursion */ > cpumask_clear_cpu(smp_processor_id(), &sm_mask); > } > > void recursive_stop_machine(cpu_stop_fn_t fn, void *data) > { > if (cpumask_test_cpu(smp_processor_id(), &sm_mask)) { > /* We were already stopped, so can just call directly */ > fn(data); > } > else { > /* Our CPU is not currently stopped */ > struct sm_args *args = {fn, data}; > stop_machine(do_recursive_stop, args, NULL); > } > } ... I think a single bool is sufficient in place of the cpumask, since it is set and cleared within stop_machine - I started out trying to set/clear outside. _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx