Re: [PATCHv1 3/6] rdmacg: implements rdma cgroup

Parav Pandit <pandit.parav@xxxxxxxxx> · Thu, 7 Jan 2016 05:03:07 +0530

On Wed, Jan 6, 2016 at 3:31 AM, Tejun Heo <tj@xxxxxxxxxx> wrote:
> Hello,
>
> On Wed, Jan 06, 2016 at 12:28:03AM +0530, Parav Pandit wrote:
>> +/* hash table to keep map of tgid to owner cgroup */
>> +DEFINE_HASHTABLE(pid_cg_map_tbl, 7);
>> +DEFINE_SPINLOCK(pid_cg_map_lock);    /* lock to protect hash table access */
>> +
>> +/* Keeps mapping of pid to its owning cgroup at rdma level,
>> + * This mapping doesn't change, even if process migrates from one to other
>> + * rdma cgroup.
>> + */
>> +struct pid_cg_map {
>> +     struct pid *pid;                /* hash key */
>> +     struct rdma_cgroup *cg;
>> +
>> +     struct hlist_node hlist;        /* pid to cgroup hash table link */
>> +     atomic_t refcnt;                /* count active user tasks to figure out
>> +                                      * when to free the memory
>> +                                      */
>> +};
>
> Ugh, there's something clearly wrong here.  Why does the rdma
> controller need to keep track of pid cgroup membership?
>
Rdma resource can be allocated by parent process, used and freed by
child process.
Child process could belong to different rdma cgroup.
Parent process might have been terminated after creation of rdma
cgroup. (Followed by cgroup might have been deleted too).
Its discussed in https://lkml.org/lkml/2015/11/2/307

In nutshell, there is process that clearly owns the rdma resource.
So to keep the design simple, rdma resource is owned by the creator
process and cgroup without modifying the task_struct.

>> +static void _dealloc_cg_rpool(struct rdma_cgroup *cg,
>> +                           struct cg_resource_pool *rpool)
>> +{
>> +     spin_lock(&cg->cg_list_lock);
>> +
>> +     /* if its started getting used by other task,
>> +      * before we take the spin lock, then skip,
>> +      * freeing it.
>> +      */
>
> Please follow CodingStyle.
>
>> +     if (atomic_read(&rpool->refcnt) == 0) {
>> +             list_del_init(&rpool->cg_list);
>> +             spin_unlock(&cg->cg_list_lock);
>> +
>> +             _free_cg_rpool(rpool);
>> +             return;
>> +     }
>> +     spin_unlock(&cg->cg_list_lock);
>> +}
>> +
>> +static void dealloc_cg_rpool(struct rdma_cgroup *cg,
>> +                          struct cg_resource_pool *rpool)
>> +{
>> +     /* Don't free the resource pool which is created by the
>> +      * user, otherwise we miss the configured limits. We don't
>> +      * gain much either by splitting storage of limit and usage.
>> +      * So keep it around until user deletes the limits.
>> +      */
>> +     if (atomic_read(&rpool->creator) == RDMACG_RPOOL_CREATOR_DEFAULT)
>> +             _dealloc_cg_rpool(cg, rpool);
>
> I'm pretty sure you can get away with an fixed length array of
> counters.  Please keep it simple.  It's a simple hard limit enforcer.
> There's no need to create a massive dynamic infrastrucure.
>
Every resource pool for verbs resource is fixed length array. Length
of the array is defined by the IB stack modules.
This array is per cgroup, per device.
Its per device, because we agreed that we want to address requirement
of controlling/configuring them on per device basis.
Devices appear and disappear. Therefore they are allocated dynamically.
Otherwise this array could be static in cgroup structure.

> Thanks.
>
> --
> tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html