On Fri, Mar 16, 2012 at 05:30:33PM +0900, MyungJoo Ham wrote: > On Sun, Mar 11, 2012 at 7:53 AM, Rafael J. Wysocki <rjw@xxxxxxx> wrote: > > On Friday, March 09, 2012, MyungJoo Ham wrote: > >> On Thu, Mar 8, 2012 at 12:47 PM, mark gross <markgross@xxxxxxxxxxx> wrote: > >> > On Wed, Mar 07, 2012 at 02:02:01PM +0900, MyungJoo Ham wrote: > >> >> 1. CPU_DMA_THROUGHPUT > >> ... > >> >> 2. DVFS_LATENCY > >> > > >> > The cpu_dma_throughput looks ok to me. I do however; wonder about the > >> > dvfs_lat_pm_qos. Should that knob be exposed to user mode? Does that > >> > matter so much? why can't dvfs_lat use the cpu_dma_lat? > >> > > >> > BTW I'll be out of town for the next 10 days and probably will not get > >> > to this email account until I get home. > >> > > >> > --mark > >> > > >> > >> 1. Should DVFS Latency be exposed to user mode? > >> > >> It would depend on the policy of the given system; however, yes, there > >> are systems that require a user interface for DVFS Latency. > >> With the example of user input response (response to user click, > >> typing, touching, and etc), a user program (probably platform s/w or > >> middleware) may input QoS requests. Besides, when a new "application" > >> is starting, such "middleware" may want faster responses from DVFS > >> mechanisms. > > > > But this is a global knob, isn't it? And it seems that a per-device one > > is needed rather than that? > > > > It also applies to your CPU_DMA_THROUGHPUT thing, doesn't it? > > > Yes, the two are global knobs. And both the two control multiple > devices simultaneously, not just a single device. I suppose per-device > QoS is appropriate for QoS requests directed to a single device. Am I > right about this one? > > > Let's assume that, in an example system, we have devfreq on GPU, > memory-Interface, and main bus and CPUfreq (Exynos5 will have them all > seperated). > > If we use per-device QoS for DVFS LATENCY, in order to control the > DVFS response latency, we will need to make QoS requests to all the > four devices independently, not to the global DVFS LATENCY QOS CLASS. > There, we could have a shared single QoS request list for these four > DVFS devices, saying that the DVFS response should be done in "50ms" > after a sudden utilization increase. > > We may be able to use "dev_pm_qos_add_notifier()" for a virtual device > representing "DVFS Latency" or "DMA Throughput" and let the GPU, CPU, > main-bus, and memory-interface listen to the events from the virtual > device. Hmm..., do you recommend this approach? creating a device > representing "DVFS" as a whole (both CPUFreq and device drivers of > devfreq). > > CPU_DMA_THROUGHPUT is quite similar as CPU_DMA_LATENCY. However, we > think it is addtionally needed because many IPs (in-SoC devices) need > to specify its DMA usage in "kbytes/sec", not "usecs/ops". For > example, a video-decoding chip device driver may say it requires > "750000kbytes/sec" for 1080p60, "300000kbytes/sec" for 720p60, and so > on, which affects CPUfreq, memory-interface, and main-bus at the same > time. I have an example of a need for cpu_dma_throughput for x86 soc's as well. Mostly my example comes down to on-demand thinking the work load is low (gpu is doing all the work) yet the work load needs a higher clock rates between frame times to avoid buffer under running the gfx pipe). My version of the patch didn't fly too well because it failed to offer a scalable definition of the units of cpu_dma_throughput. I tried using KHZ as the unit (the units used in cpufreq). However; Applications written to assume HZ units on one system would need to re-written on the next. Perhaps using bandwidth would be better than throughput? > > > >> 2. Does DVFS Latency matter? > >> > >> Yes, in our experimental sets w/ Exynos4210 (those slapped in Galaxy > >> S2 equivalent; not exactly as I'm not conducted in Android systems, > >> but Tizen), we could see noticable difference w/ bare eyes for > >> user-input responses. When we shortened DVFS polling interval with > >> touches, the touch responses were greatly improved; e.g., losing 10 > >> frames into losing 0 or 1 frame for a sudden input rush. > > > > Well, this basically means PM QoS matters, which is kind of obvious. > > It doesn't mean that it can't be implemented in a better way, though. > > For DVFS-Latency and DMA-Throughput, I think a normal pm-qos-dev (one > device per one qos knob) isn't appropriate because there are multiple > devices that are required to react simultaneously. > > It is possible to let multiple devices react by adding notifiers with > dev_pm_qos_add_notifier(). However, I felt that it wasn't the purpose > of this one and it might get things ugly. Anyway, was allowing > multiple devices to change their frequencies/voltages for a single > per-device QoS list the purpose of dev_pm_qos_add_notifier()? > > > Just throwing an idea and suggestion if it was the purpose, > I speculate that If we are going to do this (supporting multiple > devices per one qos knob without adding QoS class), we'd better create > "qos class device" in /drivers/qos/ and let those qos class handle > multiple devices depending on a single "qos class". Probably, this > will transform "global PM-QoS class" that notifies related devices > into "QoS class device" that notifies related devices. > > > > >> 3. Why not replace DVFS Latency w/ CPU-DMA-Latency/Throughput? > >> > >> When we implement the user-input response enhancement with CPU-DMA QoS > >> requests, the PM-QoS will unconditionally increase CPU and BUS > >> frequencies/voltages with user inputs. However, with many cases it is > >> unnecessary; i.e., a user input means that there will be unexpected > >> changes soon; however, the change does not mean that the load will > >> increase. Thus, allowing DVFS mechanism to evolve faster was enough to > >> shorten the response time and not to increase frequencies and voltages > >> when not needed. There were significant difference in power > >> consumption with this changes if the user inputs were not involving > >> drastic graphics jobs; e.g., typing a text message. > > > > Again, you're arguing for having PM QoS rather than not having it. You don't > > have to do that. :-) > > > > Generally speaking, I don't think we should add any more PM QoS "classes" > > as defined in pm_qos.h, since they are global and there's only one > > list of requests per class. While that may be good for CPU power > > management (in an SMP system all CPUs are identical, so the same list of > > requests may be applied to all of them), it generally isn't for I/O > > devices (some of them work in different time scales, for example). > > > > So, for example, most likely, a list of PM QoS requests for storage devices > > shouldn't be applied to input devices (keyboards and mice to be precise) and > > vice versa. > > > > On the other hand, I don't think that applications should access PM QoS > > interfaces associated with individual devices directly, because they may > > not have enough information about the relationships between devices in the > > system. So, perhaps, there needs to be an interface allowing applications > > to specify their PM QoS expectations in a general way (e.g. "I want <number> > > disk I/O throughput") and a code layer between that interface and device > > drivers translating those expecataions into PM QoS requests for specific > > devices. > > With DVFS Latency PM QoS Class, we can say "I want the system to react > in 50ms for any sudden utilization increases.". Without it, we should > say, for example, "CPUFreq/Ondemand should set interval at 25ms, > Devfreq/Bus should set interval at 25ms, and Devfreq/GPU should set > interval at 10ms." > > And with CPU Throughput PM QoS Class, we can say "I want 1000000 > kbytes/sec DMA transfer". Without it, we should say "Memory-Interface > at 1000000 kbytes/sec, Exynos4412 core should be at least 500MHz, and > Bus should be at least 166MHz". > What things are coming down to is we need to see if we can identify good abstractions that can be portable / scalable across ISA's and boards, such that applications would not need to be changed to work correctly across all of them. One issue I have with adding a single DVFS latency and throughput pm-qos parameter is that what Device the DVFS *really* means changes from one board to the next. Thus making it impossible to abstract to user mode. --mark -- To unsubscribe from this list: send the line "unsubscribe linux-next" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html