Re: [PATCH v9 7/7] drm/i915: add a sysfs entry to let users set sseu configs

Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> · Wed, 13 Jun 2018 15:49:49 +0300

Quoting Tvrtko Ursulin (2018-06-12 15:02:07)
> 
> On 12/06/2018 11:52, Lionel Landwerlin wrote:
> > On 12/06/18 11:37, Chris Wilson wrote:
> >> Quoting Lionel Landwerlin (2018-06-12 11:33:34)
> >>> On 12/06/18 10:20, Joonas Lahtinen wrote:
> >>>> Quoting Chris Wilson (2018-06-11 18:02:37)
> >>>>> Quoting Lionel Landwerlin (2018-06-11 14:46:07)
> >>>>>> On 11/06/18 13:10, Tvrtko Ursulin wrote:
> >>>>>>> On 30/05/2018 15:33, Lionel Landwerlin wrote:
> >>>>>>>> There are concerns about denial of service around the per 
> >>>>>>>> context sseu
> >>>>>>>> configuration capability. In a previous commit introducing the
> >>>>>>>> capability we allowed it only for capable users. This changes 
> >>>>>>>> adds a
> >>>>>>>> new debugfs entry to let any user configure its own context
> >>>>>>>> powergating setup.
> >>>>>>> As far as I understood it, Joonas' concerns here are:
> >>>>>>>
> >>>>>>> 1) That in the containers use case individual containers wouldn't be
> >>>>>>> able to turn on the sysfs toggle for them.
> >>>>>>>
> >>>>>>> 2) That also in the containers use case if box admin turns on the
> >>>>>>> feature, some containers would potentially start negatively 
> >>>>>>> affecting
> >>>>>>> the others (via the accumulated cost of slice re-configuration on
> >>>>>>> context switching).
> >>>>>>>
> >>>>>>> I am not familiar with typical container setups to be authoritative
> >>>>>>> here, but intuitively I find it reasonable that a low-level hardware
> >>>>>>> switch like this would be under the control of a master domain
> >>>>>>> administrator. ("If you are installing our product in the container
> >>>>>>> environment, make sure your system administrator enables this 
> >>>>>>> hardware
> >>>>>>> feature.", "Note to system administrators: Enabling this features 
> >>>>>>> may
> >>>>>>> negatively affect the performance of other containers.")
> >>>>>>>
> >>>>>>> Alternative proposal is for the i915 to apply an "or" filter on all
> >>>>>>> requested masks and in that way ensure dynamic re-configuration
> >>>>>>> doesn't happen on context switches, but driven from userspace via 
> >>>>>>> ioctls.
> >>>>>>>
> >>>>>>> In other words, should _all_ userspace agree between themselves that
> >>>>>>> they want to turn off a slice, they would then need to send out a
> >>>>>>> concerted ioctl storm, where number of needed ioctls equals the 
> >>>>>>> number
> >>>>>>> of currently active contexts. (This may have its own performance
> >>>>>>> consequences caused by the barriers needed to modify all context 
> >>>>>>> images.)
> >>>>>>>
> >>>>>>> This was deemed acceptable the the media use case, but my concern is
> >>>>>>> the approach is not elegant and will tie us with the "or" policy in
> >>>>>>> the ABI. (Performance concerns I haven't evaluated yet, but they 
> >>>>>>> also
> >>>>>>> may be significant.)
> >>>>>>>
> >>>>>>> If we go back thinking about the containers use case, then it
> >>>>>>> transpires that even though the "or" policy does prevent one 
> >>>>>>> container
> >>>>>>> from affecting the other from one angle, it also prevents one
> >>>>>>> container from exercising the feature unless all containers 
> >>>>>>> co-operate.
> >>>>>>>
> >>>>>>> As such, we can view the original problem statement where we have an
> >>>>>>> issue if not everyone co-operates, as conceptually the same just 
> >>>>>>> from
> >>>>>>> an opposite angle. (Rather than one container incurring the 
> >>>>>>> increased
> >>>>>>> cost of context switches to the rest, we would have one container
> >>>>>>> preventing the optimized slice configuration to the other.)
> >>>>>>>
> >>>>>>>   From this follows that both proposals require complete 
> >>>>>>> co-operation
> >>>>>>> from all running userspace to avoid complete control of the feature.
> >>>>>>>
> >>>>>>> Since the balance between the benefit of optimized slice 
> >>>>>>> configuration
> >>>>>>> (or penalty of suboptimal one), versus the penalty of increased
> >>>>>>> context switch times, cannot be know by the driver (barring 
> >>>>>>> venturing
> >>>>>>> into the heuristics territory), that is another reason why I find 
> >>>>>>> the
> >>>>>>> "or" policy in the driver questionable.
> >>>>>>>
> >>>>>>> We can also ask a question of - If we go with the "or" policy, why
> >>>>>>> require N per-context ioctls to modify the global GPU configuration
> >>>>>>> and not instead add a global driver ioctl to modify the state?
> >>>>>>>
> >>>>>>> If a future hardware requires, or enables, the per-context behaviour
> >>>>>>> in a more efficient way, we could then revisit the problem space.
> >>>>>>>
> >>>>>>> In the mean time I see the "or" policy solution as adding some ABI
> >>>>>>> which doesn't do anything for many use cases without any way for the
> >>>>>>> sysadmin to enable it. At the same time master sysfs knob at least
> >>>>>>> enables the sysadmin to make a decision. Here I am thinking about a
> >>>>>>> random client environment where not all userspace co-operates, 
> >>>>>>> but for
> >>>>>>> instance user is running the feature aware media stack, and
> >>>>>>> non-feature aware OpenCL/3d stack.
> >>>>>>>
> >>>>>>> I guess the complete story boils down to - is the master sysfs knob
> >>>>>>> really a problem in container use cases.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>>
> >>>>>>> Tvrtko
> >>>>>> Hey Tvrtko,
> >>>>>>
> >>>>>> Thanks for summarizing a bunch of discussions.
> >>>>>> Essentially I agree with every you wrote above.
> >>>>>>
> >>>>>> If we have a global setting (determined by the OR policy), what's the
> >>>>>> point of per context settings?
> >>>>>>
> >>>>>> In Dmitry's scenario, all userspace applications will work 
> >>>>>> together to
> >>>>>> reach the consensus so it sounds like we're reimplementing the policy
> >>>>>> that is already existing in userspace.
> >>>>>>
> >>>>>> Anyway, I'm implementing Joonas' suggestion. Hopefully somebody else
> >>>>>> than me pick one or the other :)
> >>>>> I'll just mention the voting/consensus approach to see if anyone else
> >>>>> likes it.
> >>>>>
> >>>>> Each context has a CONTEXT_PARAM_HINT_SSEU { small, dontcare, large }
> >>>>> (or some other abstract names).
> >>>> Yeah, the param name should have the word _HINT_ in it when it's not a
> >>>> definitive set.
> >>>>
> >>>> There's no global setter across containers, only a scenario when
> >>>> everyone agrees or not. Tallying up the votes and going with a majority
> >>>> vote might be an option, too.
> >>>>
> >>>> Regards, Joonas
> >>> Trying to test the "everyone agrees" approach here.
> >> It's not everyone agrees, but the greater good.
> > 
> > I'm looking forward to the definition of the greater good :)
> > Tvrtko wanted to avoid the heuristic territory, it seems like we're just 
> > stepping into it.
> 
> I am not sure that "small, dontcare, large" models brings much. No one 
> would probably set "dontcare" since we have to default new contexts to 
> large to be compatible.
> 
> Don't know, I still prefer the master knob option. Honestly don't yet 
> see the containers use case as a problem. There is always a master 
> domain in any system where the knob can be enabled if the customers on 
> the system are such to warrant it. On mixed systems enable it or not 
> someone always suffers. And with the knob we are free to add heuristics 
> later, keep the uapi, and just default the knob to on.

Master knob effectively means dead code behind a switch, that's not very
upstream friendly.

Regards, Joonas
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx