On 16 September 2015 at 11:26, Juri Lelli <juri.lelli@xxxxxxx> wrote: > > Hi Steve, > > thanks a lot for this interesting discussion. > > On 16/09/15 00:55, Steve Muckle wrote: > > On 09/15/2015 08:00 AM, Patrick Bellasi wrote: > >>> Agreed, though I also think those tunable values might also change for a > >>> given set of tasks in different circumstances. > >> > >> Could you provide an example? > >> > >> In my view the per-task support should be exploited just for quite > >> specialized tasks, which are usually not subject to many different > >> phases during their execution. > > > > The surfaceflinger task in Android is a possible example. It can have > > the same issue as the graphics controller task you mentioned - needing > > to finish quickly so the overall display pipeline can meet its deadline, > > but often not exerting enough CPU demand by itself to raise the > > frequency high enough. > > > > SurfaceFlinger timeliness requirements, and maybe AudioFlinger's and > others' as well, might be better expressed by using other scheduling > classes, IMHO. SCHED_DEADLINE, for example, has built-in explicit I fully agree on this point that we must be sure to not create knob to solve some latency/perf/power issue in a sched class whereas it can be easily solved with a more appropriate sched class. Surface flinger and sched_deadline is a good example for this kind of "critical" task that can accept a limited amount of latency Vincent > > deadlines awareness and might work better with this kind of activities. > Not to mention that Android has already started using SCHED_FIFO for > some of its time sensitive tasks. It seems to me that the long run goal > should be to give the scheduler more information about what is going on > and then use such information to do more informed decisions (scheduling, > OPP selection, etc.). > > > Since mobile platforms are so power sensitive though, it won't be > > possible to boost surfaceflinger all the time. Perhaps the > > surfaceflinger boost could be managed by some sort of userspace daemon > > monitoring the sort of usecase running and/or whether display deadlines > > are being missed, and updating a schedtune boost cgroup. > > > > I'd say you would like to "boost" just enough to meet a certain quality > of service in the end. > > >> For example, in a graphics rendering pipeline usually we have a host > > ... > >> With SchedTune we would like to get a similar result to the one you > >> describe using min_sample_time and above_hispeed_delay by linking > >> somehow the "interpretation" of the PELT signal with the boost value. > >> > >> Right now we have in sched-DVFS an idle % headroom which is hardcoded > >> to be ~20% of the current OPP capacity. When we cross that boundary > >> that threshold with the CPU usage, we switch straight to the max OPP. > >> If we could figure out a proper mechanism to link the boost signal to > >> both the idle % headroom and the target OPP, I think we could achieve > >> quite similar results than what you can get with the knobs offered by > >> the interactive governor. > >> The more you boost a task the bigger is the idle % headroom and > >> the higher is the OPP you will jump. > > > > Let's say I have a system with one task (to set aside the per-task vs. > > global policy issue temporarily) and I want to define a policy which > > > > - quickly goes to 1.2GHz when the current frequency is less than > > that and demand exceeds capacity > > > > - waits at least 40ms (or just "a longer time") before increasing the > > frequency if the current frequency is 1.2GHz or higher > > > > This is similar to (though a simplification of) what interactive is > > often configured to do on mobile platforms. AFAIK it's a fairly common > > strategy due to the power-perf curves and OPPs available on CPUs, and at > > the same time striving to maintain decent UI responsiveness. > > > > Not that this is already in place, but, once we'll have an energy model > of the platform available to the scheduler (the EAS idea), shouldn't > this kind of considerations be possible without any explicit > configuration? I mean, it seems to me that you start reasoning about > trade-offs after you obtained power-perf curves for your platform; but, > once this data will be available to the scheduler, don't you think we > could put a bit more intelligence there to make the same kind of > decisions you would configure a governor to do? > > > Even with the proposed modification to link boost with idle % and target > > OPP I don't think there'd currently be a way to express this policy, > > which goes beyond the linear scaling of the magnitude of CPU demand > > requested by a task, idle headroom or target OPP. > > > >> > > ... > >>> The hardcoded values in the > >>> task load tracking algorithm seem concerning though from a tuning > >>> standpoint. > >> > >> I agree, that's why we are thinking about the solution described > >> before. Exploit the boost value to replace the hardcoded thresholds > >> should allow to get more flexibility while being per-task defined. > >> Hopefully, tuning per task can be more easy and effective than > >> selection a single value fitting all needs. > >> > >>> > >>>>> The interactive functionality would require additional knobs. I > >>> ... > >>>> However, regarding specifically the latency on OPP changes, there are > >>>> a couple of extension we was thinking about: > >>>> 1. link the SchedTune boost value with the % of idle headroom which > >>>> triggers an OPP increase > >>>> 2. use the SchedTune boost value to defined the high frequency to jump > >>>> at when a CPU crosses the % of idle headroom > >>> > >>> Hmmm... This may be useful (only testing/profiling would tell) though it > >>> may be nice to be able to tune these values. > >> > >> Again, in my view the tuning should be per task with a single knob. > >> The value of the knob should than be properly mapped on other internal > >> values to obtain a well defined behavior driven by information shared > >> with the scheduler, i.e. a PELT signal. > >> > >>>> These are tunables which allows to parameterize the way the PELT > >>>> signal for CPU usage is interpreted by the sched-DVFS governor. > >>>> > >>>> How such tunables should be exposed and tuned is to be discussed. > >>>> Indeed, one of the main goals of the sched-DVFS and SchedTune > >>>> specifically, is to simplify the tuning of a platform by exposing to > >>>> userspace a reduced number of tunables, preferably just one. > >>> > >>> This last point (the desire for a single tunable) is perhaps at the root > >>> of my main concern. There are users/vendors for whom the current > >>> tunables are insufficient, resulting in their hacking the governors to > >>> add more tunables or features in the policy. > >> > >> We should also consider that we are proposing not only a single > >> tunable but also a completely different standpoint. Not more a "blind" > >> system-wide view on the average system behaviors, but instead a more > >> detailed view on tasks behaviors. A single tunable used to "tag" tasks > >> maybe it's not such a limited solution in this design. > > > > I think the algorithm is still fairly blind. There still has to be a > > heuristic for future CPU usage, it's now just per-task and in the > > scheduler (PELT), whereas it used to be per-CPU and in the governor. > > > > This allows for good features like adjusting frequency right away on > > task migration/creation/exit or per task boosting etc., but I think > > policy will still be important. Tasks change their behavior all the > > time, at least in the mobile usecases I've seen. > > > >>> Consolidating CPU frequency and idle management in the scheduler will > >>> clean things up and probably make things more effective, but I don't > >>> think it will remove the need for a highly configurable policy. > >> > >> This can be verified only by starting to use sched-DVFS + SchedTune on > >> real/synthetic setup to verify which features are eventually missing, > >> or specific use-cases not properly managed. > >> If we are able to setup these experiments perhaps we will be able to > >> identify a better design for a scheduler driver solution. > > > > Agree. I hope to be able to run some of these experiments to help. > > > >>> I'm curious about the drive for one tunable. Is that something there's > > ... > >> We have plenty of experience, collected on the past years, on CPUFreq > >> governors and customer specific mods. > >> Don't you think we can exploit that experience to reason around a > >> fresh new design that allows to satisfy all requirements while > >> providing possibly a simpler interface? > > > > Sure. I'm just communicating requirements I've seen :) . > > > > And that's great! :-) > > >> I agree with you that all the current scenarios must be supported by > >> the new proposal. We should probably start by listing them and come > >> out with a set of test cases that allow to verify where we are wrt > >> the state of the art. > > > > Sounds like a good plan to me... Perhaps we could discuss some mobile > > usecases next week at Linaro Connect? > > > > I'm up for it! > > Best, > > - Juri > > -- > To unsubscribe from this list: send the line "unsubscribe linux-pm" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html