Re: [PATCHv12 0/3] rdmacg: IB/core: rdma controller support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 06, 2016 at 07:19:24PM +0530, Parav Pandit wrote:
> Hi Leon,
>
> On Wed, Oct 5, 2016 at 4:52 PM, Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> > On Wed, Aug 31, 2016 at 02:07:24PM +0530, Parav Pandit wrote:
> >> rdmacg: IB/core: rdma controller support
> >>
> >> Overview:
> >> Currently user space applications can easily take away all the rdma
> >> device specific resources such as AH, CQ, QP, MR etc. Due to which other
> >> applications in other cgroup or kernel space ULPs may not even get chance
> >> to allocate any rdma resources. This results into service unavailibility.
> >>
> >> RDMA cgroup addresses this issue by allowing resource accounting,
> >> limit enforcement on per cgroup, per rdma device basis.
> >>
> >> RDMA uverbs layer will enforce limits on well defined RDMA verb
> >> resources without any HCA vendor device driver involvement.
> >>
> >> RDMA uverbs layer will not do limit enforcement of HCA hw vendor
> >> specific resources. Instead rdma cgroup provides set of APIs
> >> through which vendor specific drivers can do resource accounting
> >> by making use of rdma cgroup.
> >
> > Hi Parav,
> > I want to propose an extension to the RDMA cgroup which can be done as
> > follow-up patches.
> >
> > Let's add new global type, which will control whole HCA (for example in percentages). It will
> > allow natively define new objects without need to introduce them to the user.
> >
> In other cgroup such as CPU, this is done using cpu.weight API. Where
> percentage or weight is configured by the user.
> In this mode, resources taken away from other cgroup proportionately.
> It works for cpu because its mainly stateless resource unlike rdma
> resources.
> So if we want to simplify user configuration similarly,
> percentage/weight configuration can be extended.
> This way they need not be introduced to users.
> I hope your definition of "user" is actual end-user and not rdma cgroup. Right?

Yes, "user" -> "admin".
I think that percentage is more intuitive to them and will be much easier to
explain how to use it. I always have in mind "swappiness" field and the
numerous questions on how to configure it.

> In other words, new object should be still added as new enum value in
> rdma_cgroup.h?

Yes, I had in mind something like IB_CGROUP_HCA, this is why it can be
done as a future work after accepting current patches.

> Only than it can be overwritten by specific UVERBs type as you
> described below. I think thats what you meant as you described below.

Exactly.

>
> Otherwise charging/uncharging this new percentage resource can get messy.

Agree

>
> > This HCA share will be overwritten by specific UVERBS types which you
> > already defined.
> >
> > What do you think?
>
> So to refine your proposal from cgroup perspective, instead of adding
> new resource type in rdma_cgroup.h for percentage, I prefer to have
>
> Existing
> 1. rdma.max
> 2. rdma.current
> New,
> 3. rdma.weight
> This ABI will have similar API to say
> echo "mlx4_0 50" > rdma.weight.
> Where 50 is weight of the resources.
> For example,
> for one cgroup instance weight=sum=100% resource for a given cgroup.
> for three cgroup instances percentage=(weight/sum)% = 50/(50+50+50) = 33%.
> One cgroup gets 33% resource.
>
> Weight can be in range of 1 to 10,000 similar to cpu cgroup.

This is exactly what I don't like, the percentage will remove from the
user the translation needs between weight and actual limitation.

IMHO CPU used weights because everything there is in weights :).

>
> This might work if applications running in all cgroups are similar.
> But weight doesn't do justice, when there are different type of
> applications running in each cgroup. Such as few running libfabric
> based apps, few running MPI, others directly using ibverbs.
> So as you said rdma.max configuration would be required for management
> plane to override weight (percentage) for certain resources.

Why?
The device exposes max values during initialization and if user asked
for 20% percent of HCA, he will get max*0.2.

>
>
> >
> > Except this proposal,
> > Reviewed-by: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> >
> > Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux