Re: [PATCH v2] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Intel-gfx [mailto:intel-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx] On Behalf
> Of Bloomfield, Jon
> Sent: Wednesday, December 6, 2017 9:01 AM
> To: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>; intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> Cc: Daniel Vetter <daniel.vetter@xxxxxxxx>
> Subject: Re:  [PATCH v2] drm/i915: Prevent machine hang from
> Broxton's vtd w/a and error capture
> 
> > -----Original Message-----
> > From: Chris Wilson [mailto:chris@xxxxxxxxxxxxxxxxxx]
> > Sent: Wednesday, December 6, 2017 7:38 AM
> > To: intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>; Bloomfield, Jon
> > <jon.bloomfield@xxxxxxxxx>; Harrison, John C
> <john.c.harrison@xxxxxxxxx>;
> > Ursulin, Tvrtko <tvrtko.ursulin@xxxxxxxxx>; Joonas Lahtinen
> > <joonas.lahtinen@xxxxxxxxxxxxxxx>; Daniel Vetter <daniel.vetter@xxxxxxxx>
> > Subject: [PATCH v2] drm/i915: Prevent machine hang from Broxton's vtd
> w/a
> > and error capture
> >
> > Since capturing the error state requires fiddling around with the GGTT
> > to read arbitrary buffers and is itself run under stop_machine(), it
> > deadlocks the machine (effectively a hard hang) when run in conjunction
> > with Broxton's VTd workaround to serialize GGTT access.
> >
> > v2: Store the ERR_PTR in first_error so that the error can be reported
> > to the user via sysfs.
> >
> > Fixes: 0ef34ad6222a ("drm/i915: Serialize GTT/Aperture accesses on BXT")
> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > Cc: Jon Bloomfield <jon.bloomfield@xxxxxxxxx>
> > Cc: John Harrison <john.C.Harrison@xxxxxxxxx>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
> > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>
> > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx>
> 
> It's  a real shame to lose error capture on BXT. Can we wrap stop_machine to
> make it recursive ?
> 
> Something like...
> 
> static cpumask_t sm_mask;
> 
> struct sm_args {
>         cpu_stop_fn_t *fn;
>         void *data;
> };
> 
> void do_recursive_stop(void *sm_arg_data)
> {
>         struct sm_arg *args = sm_arg_data;
> 
>         /* We're stopped - flag the fact to prevent recursion */
>         cpumask_set_cpu(smp_processor_id(), &sm_mask);
> 
>         args->fn(args->data);
> 
>         /* Re-enable recursion */
>         cpumask_clear_cpu(smp_processor_id(), &sm_mask);
> }
> 
> void recursive_stop_machine(cpu_stop_fn_t fn, void *data)
> {
>         if (cpumask_test_cpu(smp_processor_id(), &sm_mask)) {
>                 /* We were already stopped, so can just call directly */
>                 fn(data);
>         }
>         else {
>                 /* Our CPU is not currently stopped */
>                 struct sm_args *args = {fn, data};
>                 stop_machine(do_recursive_stop, args, NULL);
>         }
> }

... I think a single bool is sufficient in place of the cpumask, since it is set and cleared
within stop_machine - I started out trying to set/clear outside.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux