On 2019-10-11 1:12 p.m., tj@xxxxxxxxxx wrote:
Hello, Daniel.
On Wed, Oct 09, 2019 at 06:06:52PM +0200, Daniel Vetter wrote:
That's not the point I was making. For cpu cgroups there's a very well
defined connection between the cpu bitmasks/numbers in cgroups and the cpu
bitmasks you use in various system calls (they match). And that stuff
works across vendors.
Please note that there are a lot of limitations even to cpuset.
Affinity is easy to implement and seems attractive in terms of
absolute isolation but it's inherently cumbersome and limited in
granularity and can lead to surprising failure modes where contention
on one cpu can't be resolved by the load balancer and leads to system
wide slowdowns / stalls caused by the dependency chain anchored at the
affinity limited tasks.
Maybe this is a less of a problem for gpu workloads but in general the
more constraints are put on scheduling, the more likely is the system
to develop twisted dependency chains while other parts of the system
are sitting idle.
How does scheduling currently work when there are competing gpu
workloads? There gotta be some fairness provision whether that's unit
allocation based or time slicing, right?
The scheduling of competing workloads on GPUs is handled in hardware and
firmware. The Linux kernel and driver are not really involved. We have
some knobs we can tweak in the driver (queue and pipe priorities,
resource reservations for certain types of workloads), but they are
pretty HW-specific and I wouldn't make any claims about fairness.
Regards,
Felix
If that's the case, it might
be best to implement proportional control on top of that.
Work-conserving mechanisms are the most versatile, easiest to use and
least likely to cause regressions.
Thanks.
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx