ons. 23. jun. 2021 kl. 19:27 skrev Vincent Guittot <vincent.guittot@xxxxxxxxxx>: > > On Wed, 23 Jun 2021 at 18:55, Vincent Guittot > <vincent.guittot@xxxxxxxxxx> wrote: > > > > On Wed, 23 Jun 2021 at 18:46, Sachin Sant <sachinp@xxxxxxxxxxxxxxxxxx> wrote: > > > > > > > > > > Ok. This becomes even more weird. Could you share your config file and more details about > > > > you setup ? > > > > > > > > Have you applied the patch below ? > > > > https://lore.kernel.org/lkml/20210621174330.11258-1-vincent.guittot@xxxxxxxxxx/ > > > > > > > > Regarding the load_avg warning, I can see possible problem during attach. Could you add > > > > the patch below. The load_avg warning seems to happen during boot and sched_entity > > > > creation. > > > > > > > > > > Here is a summary of my testing. > > > > > > I have a POWER box with PowerVM hypervisor. On this box I have a logical partition(LPAR) or guest > > > (allocated with 32 cpus 90G memory) running linux-next. > > > > > > I started with a clean slate. > > > Moved to linux-next 5.13.0-rc7-next-20210622 as base code. > > > Applied patch #1 from Vincent which contains changes to dequeue_load_avg() > > > Applied patch #2 from Vincent which contains changes to enqueue_load_avg() > > > Applied patch #3 from Vincent which contains changes to attach_entity_load_avg() > > > Applied patch #4 from https://lore.kernel.org/lkml/20210621174330.11258-1-vincent.guittot@xxxxxxxxxx/ > > > > > > With these changes applied I was still able to recreate the issue. I could see kernel warning > > > during boot. > > > > > > I then applied patch #5 from Odin which contains changes to update_cfs_rq_load_avg() > > > > > > With all the 5 patches applied I was able to boot the kernel without any warning messages. > > > I also ran scheduler related tests from ltp (./runltp -f sched) . All tests including cfs_bandwidth01 > > > ran successfully. No kernel warnings were observed. > > > > ok so Odin's patch fixes the problem which highlights that we > > overestimate _sum or don't sync _avg and _sum correctly > > > > I'm going to look at this further > > The problem is "_avg * divider" makes the assumption that all pending > contrib are not null contributions whereas they can be null. Yeah. > Odin patch is the right way to fix this. Other patches should not be > useful for your problem Ack. As I see it, given how PELT works now, it is the only way to mitigate it (without doing a lot of extra PELT stuff). Will post it as a patch together with a proper message later today or tomorrow. > > > > > > > > > Have also attached .config in case it is useful. config has CONFIG_HZ_100=y > > > > Thanks, i will have a look > > > > > > > > Thanks > > > -Sachin > > > Thanks for reporting Sachin! Thanks Odin