Hi Jason, On Thu, 25 Mar 2021 14:16:45 -0300, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Thu, Mar 25, 2021 at 10:02:36AM -0700, Jacob Pan wrote: > > Hi Jean-Philippe, > > > > On Thu, 25 Mar 2021 11:21:40 +0100, Jean-Philippe Brucker > > <jean-philippe@xxxxxxxxxx> wrote: > > > > > On Wed, Mar 24, 2021 at 03:12:30PM -0700, Jacob Pan wrote: > > > > Hi Jason, > > > > > > > > On Wed, 24 Mar 2021 14:03:38 -0300, Jason Gunthorpe <jgg@xxxxxxxxxx> > > > > wrote: > > > > > On Wed, Mar 24, 2021 at 10:02:46AM -0700, Jacob Pan wrote: > > > > > > > Also wondering about device driver allocating auxiliary > > > > > > > domains for their private use, to do iommu_map/unmap on > > > > > > > private PASIDs (a clean replacement to super SVA, for > > > > > > > example). Would that go through the same path as /dev/ioasid > > > > > > > and use the cgroup of current task? > > > > > > > > > > > > For the in-kernel private use, I don't think we should restrict > > > > > > based on cgroup, since there is no affinity to user processes. I > > > > > > also think the PASID allocation should just use kernel API > > > > > > instead of /dev/ioasid. Why would user space need to know the > > > > > > actual PASID # for device private domains? Maybe I missed your > > > > > > idea? > > > > > > > > > > There is not much in the kernel that isn't triggered by a > > > > > process, I would be careful about the idea that there is a class > > > > > of users that can consume a cgroup controlled resource without > > > > > being inside the cgroup. > > > > > > > > > > We've got into trouble before overlooking this and with something > > > > > greenfield like PASID it would be best built in to the API to > > > > > prevent a mistake. eg accepting a cgroup or process input to the > > > > > allocator. > > > > Make sense. But I think we only allow charging the current cgroup, > > > > how about I add the following to ioasid_alloc(): > > > > > > > > misc_cg = get_current_misc_cg(); > > > > ret = misc_cg_try_charge(MISC_CG_RES_IOASID, misc_cg, 1); > > > > if (ret) { > > > > put_misc_cg(misc_cg); > > > > return ret; > > > > } > > > > > > Does that allow PASID allocation during driver probe, in kernel_init > > > or modprobe context? > > > > > Good point. Yes, you can get cgroup subsystem state in kernel_init for > > charging/uncharging. I would think module_init should work also since > > it is after kernel_init. I have tried the following: > > static int __ref kernel_init(void *unused) > > { > > int ret; > > + struct cgroup_subsys_state *css; > > + css = task_get_css(current, pids_cgrp_id); > > > > But that would imply: > > 1. IOASID has to be built-in, not as module > > 2. IOASIDs charged on PID1/init would not subject to cgroup limit since > > it will be in the root cgroup and we don't support migration nor will > > migrate. > > > > Then it comes back to the question of why do we try to limit in-kernel > > users per cgroup if we can't enforce these cases. > > Are these real use cases? Why would a driver binding to a device > create a single kernel pasid at bind time? Why wouldn't it use > untagged DMA? > For VT-d, I don't see such use cases. All PASID allocations by the kernel drivers has proper process context. > When someone needs it they can rework it and explain why they are > doing something sane. > Agreed. > Jason Thanks, Jacob