On 21/12/16 15:50, Vincent Guittot wrote: IMHO, the overall idea makes sense to me. Just a couple of small questions ... > The update of the share of a cfs_rq is done when its load_avg is updated > but before the group_entity's load_avg has been updated for the past time > slot. This generates wrong load_avg accounting which can be significant > when small tasks are involved in the scheduling. Why for small tasks? Is it because we use load = scale_load_down(cfs_rq->load.weight) and not load = cfs_rq->avg.load_avg in calc_cfs_shares()? > Let take the example of a task a that is dequeued of its task group A: > root > (cfs_rq) > \ > (se) > A > (cfs_rq) > \ > (se) > a > > Task "a" was the only task in task group A which becomes idle when a is > dequeued. > > We have the sequence: > > - dequeue_entity a->se > - update_load_avg(a->se) > - dequeue_entity_load_avg(A->cfs_rq, a->se) > - update_cfs_shares(A->cfs_rq) > A->cfs_rq->load.weight == 0 You mean A->cfs_rq->load.weight = 0 ? > A->se->load.weight is updated with the new share (0 in this case) Shouldn't this be MIN_SHARES (2) instead of 0? > - dequeue_entity A->se > - update_load_avg(A->se) but its weight is now null so the last time > slot (up to a tick) will be accounted with a weight of 0 instead of > its real weight during the time slot. The last time slot will be > accounted as an idle one whereas it was a running one. > > If the running time of task a is short enough that no tick happens when it > runs, all running time of group entity A->se will be accounted as idle > time. > > Instead, we should update the share of a cfs_rq (in fact the weight of its > group entity) only after having updated the load_avg of the group_entity. This is because we use 'se->on_rq * scale_load_down(se->load.weight)' in __update_load_avg() as weight parameter for PELT load_avg update? > update_cfs_shares() now take the sched_entity as parameter instead of the > cfs_rq and the weight of the group_entity is updated only once its load_avg > has been synced with current time. [...] -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html