Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[AMD Official Use Only - Internal Distribution Only]


Hi Tejun,

Thanks for taking the time to reply.

Perhaps we can even narrow things down to just gpu.weight/gpu.compute.weight as a start?  In this aspect, is the key objection to the current implementation of gpu.compute.weight the work-conserving bit?  This work-conserving requirement is probably what I have missed for the last two years (and hence going in circle.)

If this is the case, can you clarify/confirm the followings?

1) Is resource scheduling goal of cgroup purely for the purpose of throughput?  (at the expense of other scheduling goals such as latency.)
2) If 1) is true, under what circumstances will the "Allocations" resource distribution model (as defined in the cgroup-v2) be acceptable?
3) If 1) is true, are things like cpuset from cgroup v1 no longer acceptable going forward?

To be clear, while some have framed this (time sharing vs spatial sharing) as a partisan issue, it is in fact a technical one.  I have implemented the gpu cgroup support this way because we have a class of users that value low latency/low jitter/predictability/synchronicity.  For example, they would like 4 tasks to share a GPU and they would like the tasks to start and finish at the same time.

What is the rationale behind picking the Weight model over Allocations as the first acceptable implementation?  Can't we have both work-conserving and non-work-conserving ways of distributing GPU resources?  If we can, why not allow non-work-conserving implementation first, especially when we have users asking for such functionality?

Regards,
Kenny



From: Tejun Heo <htejun@xxxxxxxxx> on behalf of Tejun Heo <tj@xxxxxxxxxx>
Sent: Monday, April 13, 2020 3:11 PM
To: Kenny Ho <y2kenny@xxxxxxxxx>
Cc: Ho, Kenny <Kenny.Ho@xxxxxxx>; cgroups@xxxxxxxxxxxxxxx <cgroups@xxxxxxxxxxxxxxx>; dri-devel <dri-devel@xxxxxxxxxxxxxxxxxxxxx>; amd-gfx list <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; Kuehling, Felix <Felix.Kuehling@xxxxxxx>; Greathouse, Joseph <Joseph.Greathouse@xxxxxxx>; jsparks@xxxxxxxx <jsparks@xxxxxxxx>
Subject: Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem
 
Hello, Kenny.

On Tue, Mar 24, 2020 at 02:49:27PM -0400, Kenny Ho wrote:
> Can you elaborate more on what are the missing pieces?

Sorry about the long delay, but I think we've been going in circles for quite
a while now. Let's try to make it really simple as the first step. How about
something like the following?

* gpu.weight (should it be gpu.compute.weight? idk) - A single number
  per-device weight similar to io.weight, which distributes computation
  resources in work-conserving way.

* gpu.memory.high - A single number per-device on-device memory limit.

The above two, if works well, should already be plenty useful. And my guess is
that getting the above working well will be plenty challenging already even
though it's already excluding work-conserving memory distribution. So, let's
please do that as the first step and see what more would be needed from there.

Thanks.

--
tejun
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux