On Fri, May 7, 2021 at 3:33 PM Tejun Heo <tj@xxxxxxxxxx> wrote: > > Hello, > > On Fri, May 07, 2021 at 06:54:13PM +0200, Daniel Vetter wrote: > > All I meant is that for the container/cgroups world starting out with > > time-sharing feels like the best fit, least because your SRIOV designers > > also seem to think that's the best first cut for cloud-y computing. > > Whether it's virtualized or containerized is a distinction that's getting > > ever more blurry, with virtualization become a lot more dynamic and > > container runtimes als possibly using hw virtualization underneath. > > FWIW, I'm completely on the same boat. There are two fundamental issues with > hardware-mask based control - control granularity and work conservation. > Combined, they make it a significantly more difficult interface to use which > requires hardware-specific tuning rather than simply being able to say "I > wanna prioritize this job twice over that one". > > My knoweldge of gpus is really limited but my understanding is also that the > gpu cores and threads aren't as homogeneous as the CPU counterparts across > the vendors, product generations and possibly even within a single chip, > which makes the problem even worse. > > Given that GPUs are time-shareable to begin with, the most universal > solution seems pretty clear. The problem is temporal partitioning on GPUs is much harder to enforce unless you have a special case like SR-IOV. Spatial partitioning, on AMD GPUs at least, is widely available and easily enforced. What is the point of implementing temporal style cgroups if no one can enforce it effectively? Alex > > Thanks. > > -- > tejun