From: Tejun Heo <tj@xxxxxxxxxx> Date: Tue, 14 Mar 2017 19:25:56 -0400 > The net_cls controller controls the classid field of each socket which > is associated with the cgroup. Because the classid is per-socket > attribute, when a task migrates to another cgroup or the configured > classid of the cgroup changes, the controller needs to walk all > sockets and update the classid value, which was implemented by > 3b13758f51de ("cgroups: Allow dynamically changing net_classid"). > > While the approach is not scalable, migrating tasks which have a lot > of fds attached to them is rare and the cost is born by the ones > initiating the operations. However, for simplicity, both the > migration and classid config change paths call update_classid() which > scans all fds of all tasks in the target css. This is an overkill for > the migration path which only needs to cover a much smaller subset of > tasks which are actually getting migrated in. > > On cgroup v1, this can lead to unexpected scalability issues when one > tries to migrate a task or process into a net_cls cgroup which already > contains a lot of fds. Even if the migration traget doesn't have many > to get scanned, update_classid() ends up scanning all fds in the > target cgroup which can be extremely numerous. > > Unfortunately, on cgroup v2 which doesn't use net_cls, the problem is > even worse. Before bfc2cf6f61fc ("cgroup: call subsys->*attach() only > for subsystems which are actually affected by migration"), cgroup core > would call the ->css_attach callback even for controllers which don't > see actual migration to a different css. > > As net_cls is always disabled but still mounted on cgroup v2, whenever > a process is migrated on the cgroup v2 hierarchy, net_cls sees > identity migration from root to root and cgroup core used to call > ->css_attach callback for those. The net_cls ->css_attach ends up > calling update_classid() on the root net_cls css to which all > processes on the system belong to as the controller isn't used. This > makes any cgroup v2 migration O(total_number_of_fds_on_the_system) > which is horrible and easily leads to noticeable stalls triggering RCU > stall warnings and so on. > > The worst symptom is already fixed in upstream by bfc2cf6f61fc > ("cgroup: call subsys->*attach() only for subsystems which are > actually affected by migration"); however, backporting that commit is > too invasive and we want to avoid other cases too. > > This patch updates net_cls's cgrp_attach() to iterate fds of only the > processes which are actually getting migrated. This removes the > surprising migration cost which is dependent on the total number of > fds in the target cgroup. As this leaves write_classid() the only > user of update_classid(), open-code the helper into write_classid(). > > Reported-by: David Goode <dgoode@xxxxxx> > Fixes: 3b13758f51de ("cgroups: Allow dynamically changing net_classid") > Cc: stable@xxxxxxxxxxxxxxx # v4.4+ > Cc: Nina Schiff <ninasc@xxxxxx> > Cc: David S. Miller <davem@xxxxxxxxxxxxx> > Signed-off-by: Tejun Heo <tj@xxxxxxxxxx> Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html