Hi Lukasz, On 25.06.2020 12:02, Lukasz Luba wrote: > Hi Sylwester, > > On 6/24/20 4:11 PM, Sylwester Nawrocki wrote: >> Hi All, >> >> On 24.06.2020 12:32, Lukasz Luba wrote: >>> I had issues with devfreq governor which wasn't called by devfreq >>> workqueue. The old DELAYED vs DEFERRED work discussions and my patches >>> for it [1]. If the CPU which scheduled the next work went idle, the >>> devfreq workqueue will not be kicked and devfreq governor won't check >>> DMC status and will not decide to decrease the frequency based on low >>> busy_time. >>> The same applies for going up with the frequency. They both are >>> done by the governor but the workqueue must be scheduled periodically. >> >> As I have been working on resolving the video mixer IOMMU fault issue >> described here: https://patchwork.kernel.org/patch/10861757 >> I did some investigation of the devfreq operation, mostly on Odroid U3. >> >> My conclusions are similar to what Lukasz says above. I would like to add >> that broken scheduling of the performance counters read and the devfreq >> updates seems to have one more serious implication. In each call, which >> normally should happen periodically with fixed interval we stop the counters, >> read counter values and start the counters again. But if period between >> calls becomes long enough to let any of the counters overflow, we will >> get wrong performance measurement results. My observations are that >> the workqueue job can be suspended for several seconds and conditions for >> the counter overflow occur sooner or later, depending among others >> on the CPUs load. >> Wrong bus load measurement can lead to setting too low interconnect bus >> clock frequency and then bad things happen in peripheral devices. >> >> I agree the workqueue issue needs to be fixed. I have some WIP code to use >> the performance counters overflow interrupts instead of SW polling and with >> that the interconnect bus clock control seems to work much better. >> > > Thank you for sharing your use case and investigation results. I think > we are reaching a decent number of developers to maybe address this > issue: 'workqueue issue needs to be fixed'. > I have been facing this devfreq workqueue issue ~5 times in different > platforms. > > Regarding the 'performance counters overflow interrupts' there is one > thing worth to keep in mind: variable utilization and frequency. > For example, in order to make a conclusion in algorithm deciding that > the device should increase or decrease the frequency, we fix the period > of observation, i.e. to 500ms. That can cause the long delay if the > utilization of the device suddenly drops. For example we set an > overflow threshold to value i.e. 1000 and we know that at 1000MHz > and full utilization (100%) the counter will reach that threshold > after 500ms (which we want, because we don't want too many interrupts > per sec). What if suddenly utilization drops to 2% (i.e. from 5GB/s > to 250MB/s (what if it drops to 25MB/s?!)), the counter will reach the > threshold after 50*500ms = 25s. It is impossible just for the counters > to predict next utilization and adjust the threshold. [...] irq triggers for underflow and overflow, so driver can adjust freq -- Best regards, Kamil Konieczny Samsung R&D Institute Poland