Hi Jon, On 04/03/25 15:32, Jon Hunter wrote: > Hi Juri, > > On 04/03/2025 08:40, Juri Lelli wrote: > > Hello! > > > > Jon reported [1] a suspend regression on a Tegra board configured to > > boot with isolcpus and bisected it to commit 53916d5fd3c0 > > ("sched/deadline: Check bandwidth overflow earlier for hotplug"). > > > > Root cause analysis pointed out that we are currently failing to > > correctly clear and restore bandwidth accounting on root domains after > > changes that initiate from partition_sched_domains(), as it is the case > > for suspend operations on that board. > > > > The way we currently make sure that accounting properly follows root > > domain changes is quite convoluted and was indeed missing some corner > > cases. So, instead of adding yet more fragile operations, I thought we > > could simplify things by always clearing and rebuilding bandwidth > > information on all domains after an update is complete. Also, we should > > be ignoring DEADLINE special tasks when doing so (e.g. sugov), since we > > ignore them already for runtime enforcement and admission control > > anyway. > > > > The following implements the approach by: > > > > - 01/05: filter out DEADLINE special tasks > > - 02/05: preparatory wrappers to be able to grab sched_domains_mutex on > > UP > > - 03/05: generalize unique visiting of root domains so that we can > > re-use the mechanism elsewhere > > - 04/05: the bulk of the approach, clean and rebuild after changes > > - 05/05: clean up a now redundant call > > > > Please test and review. The set is also available at > > > > git@xxxxxxxxxx:jlelli/linux.git upstream/deadline/domains-suspend > > > I know that this is still under review, but I have tested on my side and it > is working for me, so feel free to include my ... > > Tested-by: Jon Hunter <jonathanh@xxxxxxxxxx> Great to hear this and thanks for the super quick turn around with testing. I will be implementing the changes that Waiman (and possibly others) is suggesting and post a new version soon. Best, Juri