Re: [RFC v2 2/2] cgroup: sev: Miscellaneous cgroup documentation.

Jacob Pan <jacob.jun.pan@xxxxxxxxx> · Mon, 15 Mar 2021 15:11:55 -0700

Hi Tejun,

On Sat, 13 Mar 2021 13:05:36 -0500, Tejun Heo <tj@xxxxxxxxxx> wrote:

> Hello,
> 
> On Sat, Mar 13, 2021 at 08:57:01AM -0800, Jacob Pan wrote:
> > Isn't PIDs controller doing the charge/uncharge? I was under the
> > impression that each resource can be independently charged/uncharged,
> > why it affects other resources? Sorry for the basic question.  
> 
> Yeah, PID is an exception as we needed the initial migration to seed new
> cgroups and it gets really confusing with other ways to observe the
> processes - e.g. if you follow the original way of creating a cgroup,
> forking and then moving the seed process into the target cgroup, if we
> don't migrate the pid charge together, the numbers wouldn't agree and the
> seeder cgroup may end up running out of pids if there are any
> restrictions.
> 
Thanks for explaining. Unfortunately, it seems IOASIDs has a similar needs
in terms of migrating the charge.

> > I also didn't quite get the limitation on cgroup v2 migration, this is
> > much simpler than memcg. Could you give me some pointers?  
> 
> Migration itself doesn't have restrictions but all resources are
> distributed on the same hierarchy, so the controllers are supposed to
> follow the same conventions that can be implemented by all controllers.
> 
Got it, I guess that is the behavior required by the unified hierarchy.
Cgroup v1 would be ok? But I am guessing we are not extending on v1?

> > BTW, since the IOASIDs are used to tag DMA and bound with guest
> > process(mm) for shared virtual addressing. fork() cannot be supported,
> > so I guess clone is not a solution here.  
> 
> Can you please elaborate what wouldn't work? The new spawning into a new
> cgroup w/ clone doesn't really change the usage model. It's just a neater
> way to seed a new cgroup. If you're saying that the overall usage model
> doesn't fit the needs of IOASIDs, it likely shouldn't be a cgroup
> controller.
> 
The IOASIDs are programmed into devices to generate DMA requests tagged
with them. The IOMMU has a per device IOASID table with each entry has two
pointers:
 - the PGD of the guest process.
 - the PGD of the host process

The result of this 2 stage/nested translation is that we can share virtual
address (SVA) between guest process and DMA. The host process needs to
allocate multiple IOASIDs since one IOASID is needed for each guest process
who wants SVA.

The DMA binding among device-IOMMU-process is setup via a series of user
APIs (e.g. via VFIO).

If a process calls fork(), the children does not inherit the IOASIDs and
their bindings. Children who wish to use SVA has to call those APIs to
establish the binding for themselves.

Therefore, if a host process allocates 10 IOASIDs then does a
fork()/clone(), it cannot charge 10 IOASIDs in the new cgroup. i.e. the 10
IOASIDs stays with the process wherever it goes.

I feel this fit in the domain model, true?

> Thanks.
> 

Thanks,

Jacob