On Mon, 2017-03-13 at 15:26 -0400, Tejun Heo wrote: > Hello, Mike. > > Sorry about the long delay. > > On Mon, Feb 13, 2017 at 06:45:07AM +0100, Mike Galbraith wrote: > > > > So, as long as the depth stays reasonable (single digit or lower), > > > > what we try to do is keeping tree traversal operations aggregated or > > > > located on slow paths. There still are places that this overhead > > > > shows up (e.g. the block controllers aren't too optimized) but it > > > > isn't particularly difficult to make a handful of layers not matter at > > > > all. > > > > > > A handful of cpu bean counting layers stings considerably. > > Hmm... yeah, I was trying to think about ways to avoid full scheduling > overhead at each layer (the scheduler does a lot per each layer of > scheduling) but don't think it's possible to circumvent that without > introducing a whole lot of scheduling artifacts. Yup. > In a lot of workloads, the added overhead from several layers of CPU > controllers doesn't seem to get in the way too much (most threads do > something other than scheduling after all). Sure, don't schedule a lot, it doesn't hurt much, but there are plenty of loads that routinely do schedule a LOT, and there it matters a LOT.. which is why network benchmarks tend to be severely allergic to scheduler lard. > The only major issue that > we're seeing in the fleet is the cgroup iteration in idle rebalancing > code pushing up the scheduling latency too much but that's a different > issue. Hm, I would suspect PELT to be the culprit there. It helps smooth out load balancing, but will stack "skinny looking" tasks. -Mike -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html