On Thu, Oct 06, 2016 at 07:19:24PM +0530, Parav Pandit wrote: > Hi Leon, > > On Wed, Oct 5, 2016 at 4:52 PM, Leon Romanovsky <leon@xxxxxxxxxx> wrote: > > On Wed, Aug 31, 2016 at 02:07:24PM +0530, Parav Pandit wrote: > >> rdmacg: IB/core: rdma controller support > >> > >> Overview: > >> Currently user space applications can easily take away all the rdma > >> device specific resources such as AH, CQ, QP, MR etc. Due to which other > >> applications in other cgroup or kernel space ULPs may not even get chance > >> to allocate any rdma resources. This results into service unavailibility. > >> > >> RDMA cgroup addresses this issue by allowing resource accounting, > >> limit enforcement on per cgroup, per rdma device basis. > >> > >> RDMA uverbs layer will enforce limits on well defined RDMA verb > >> resources without any HCA vendor device driver involvement. > >> > >> RDMA uverbs layer will not do limit enforcement of HCA hw vendor > >> specific resources. Instead rdma cgroup provides set of APIs > >> through which vendor specific drivers can do resource accounting > >> by making use of rdma cgroup. > > > > Hi Parav, > > I want to propose an extension to the RDMA cgroup which can be done as > > follow-up patches. > > > > Let's add new global type, which will control whole HCA (for example in percentages). It will > > allow natively define new objects without need to introduce them to the user. > > > In other cgroup such as CPU, this is done using cpu.weight API. Where > percentage or weight is configured by the user. > In this mode, resources taken away from other cgroup proportionately. > It works for cpu because its mainly stateless resource unlike rdma > resources. > So if we want to simplify user configuration similarly, > percentage/weight configuration can be extended. > This way they need not be introduced to users. > I hope your definition of "user" is actual end-user and not rdma cgroup. Right? Yes, "user" -> "admin". I think that percentage is more intuitive to them and will be much easier to explain how to use it. I always have in mind "swappiness" field and the numerous questions on how to configure it. > In other words, new object should be still added as new enum value in > rdma_cgroup.h? Yes, I had in mind something like IB_CGROUP_HCA, this is why it can be done as a future work after accepting current patches. > Only than it can be overwritten by specific UVERBs type as you > described below. I think thats what you meant as you described below. Exactly. > > Otherwise charging/uncharging this new percentage resource can get messy. Agree > > > This HCA share will be overwritten by specific UVERBS types which you > > already defined. > > > > What do you think? > > So to refine your proposal from cgroup perspective, instead of adding > new resource type in rdma_cgroup.h for percentage, I prefer to have > > Existing > 1. rdma.max > 2. rdma.current > New, > 3. rdma.weight > This ABI will have similar API to say > echo "mlx4_0 50" > rdma.weight. > Where 50 is weight of the resources. > For example, > for one cgroup instance weight=sum=100% resource for a given cgroup. > for three cgroup instances percentage=(weight/sum)% = 50/(50+50+50) = 33%. > One cgroup gets 33% resource. > > Weight can be in range of 1 to 10,000 similar to cpu cgroup. This is exactly what I don't like, the percentage will remove from the user the translation needs between weight and actual limitation. IMHO CPU used weights because everything there is in weights :). > > This might work if applications running in all cgroups are similar. > But weight doesn't do justice, when there are different type of > applications running in each cgroup. Such as few running libfabric > based apps, few running MPI, others directly using ibverbs. > So as you said rdma.max configuration would be required for management > plane to override weight (percentage) for certain resources. Why? The device exposes max values during initialization and if user asked for 20% percent of HCA, he will get max*0.2. > > > > > > Except this proposal, > > Reviewed-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > > > > Thanks. > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html
Attachment:
signature.asc
Description: PGP signature