cc Jean Hello Tomi, On Wed, 16 May 2012, Tomi Valkeinen wrote: > I also suspect that this could be just a plain DSS bug. The default FIFO > low/high thresholds are 960/1023 bytes (i.e. DSS starts refilling the > FIFO when there are 960 or less bytes in the fifo, and stops at 1023. > The fifo is 1024 bytes). The values are calculated with fifo_size - > burst_size and fifo_size - 1. > > We are now using FIFO merge features, which combines multiple fifos into > one when possible, making the fifo size 1024*3 = 3072. Using the same > low threshold and increasing the high threshold to 960/3071 works fine. > Changing the high threshold to 3008 causes underflows. Increasing the > low threshold to ~1600 makes DSS work again. Just a few thoughts. In terms of the high threshold, it seems really strange to me that changing the high threshold would make such a difference. Naïvely, I'd assume that you'd want to set it as high as possible? I suppose in cases where the interconnect is congested, setting it lower might allow lower latency for other interconnect users, but I'd hope we don't have to worry much about that. So it doesn't seem to me that there would be any advantage to setting it lower than the maximum. Probably the low threshold is the more important parameter, from a PM perspective. If you know the FIFO's drain rate and the low threshold, it should be possible to calculate the maximum latency that the FIFO can tolerate to avoid an underflow. This could be used to specify a device PM QoS constraint to prevent the interconnect latency from exceeding that value. I'd guess the calculations would be something like this -- (I hope you can correct my relative ignorance of the DSS in the following estimates): Looking at mach-omap2/board-rx51-video.c, let's suppose that the FIFO drain rate would be 864 x 480 x 32 bits/second. Since the FIFO width is 32 bits, that's 864 x 480 = 414 780 FIFO entries/second, or (1 000 000 µs/s / 414 780 FIFO entries/s) = ~2.411 µs/FIFO entry. So if you need a low FIFO threshold at 960 entries, you could call the device PM QoS functions to set a wakeup latency constraint for the interconnect would be nothing greater than this: (2.411 µs/FIFO entry * 960 FIFO entries) = 2 314.96 µs (The reality is that it would need to be something less than this, to account for the time needed for the GFX DMA transfer to start supplying data, etc.) The ultimate goal, with Jean's device PM QoS patches, is that these constraints could change the DPLL autoidle settings or powerdomain states to ensure the constraint was met. He's got a page here: http://omappedia.org/wiki/Power_Management_Device_Latencies_Measurement (Unfortunately it's not clear what the DPLL autoidle modes and voltage scaling bits are set to for many of the estimates, and we also know that there are many software optimizations possible for our idle path.) We're still working on getting the OMAP device PM QoS patches merged, but the Linux core support is there, so you should be able to patch your drivers to use them -- see for example dev_pm_qos_add_request(). ... Similarly, for the low-power refresh case, if you know the GFX FIFO drain rate and the various latencies, it should be possible to estimate the minimum low threshold value needed in order to avoid a FIFO underflow. (By "various latencies," I mean the DPLL relock latency, the GFX DMA latency between initiating a transfer and receiving the first result data, etc. Some of these latencies may be difficult to estimate accurately. But if the major sources of variation can be identified, such as DPLL relock time or GFX DMA FIFO refill time, I'd hope we can just use trial and error to find some worst-case constant for the rest.) The goal in this ase would be to allow DPLL3 to stay unlocked for as long as possible, to save energy. This would imply finding the lowest possible FIFO low threshold that doesn't generate underflows. Using the lowest possible low threshold should leave as much room as possible in the FIFO for data, and thus maximize the amount of time that DPLL3 can stay unlocked after the high threshold is reached. Since the DPLL relock latency figures are known from the TRM section 4.7.6.7 "Latencies," we can estimate the DPLL's contribution to the low threshold setting. The DPLL relock latency depends on the DPLL's input rate and some DPLL settings, so it can vary. (We probably need a function for the interconnect device that can estimate the worst-case wakeup latency for the DSS to use, based on the rest of the system settings.) Let's reuse the 2.411 µs/FIFO entry estimate from above. For convenience, let's suppose that the DPLL relock latency from DPLL-OFF is 1.5 ms = 1500 µs. So we know that the number of FIFO slots needed simply to endure the DPLL relock process is CEIL(1500 µs/relock / 2.411 µs/FIFO entry) = CEIL(622.14 ...) = 623 FIFO entries/relock This of course doesn't account for the time needed for the GFX DMA transfer to start delivering useful data, any voltage scaling needed, etc. ... Just paging through the DSS TRM section, some other settings that might be worth checking are: - is DISPC_GFX_ATTRIBUTES.GFXBURSTSIZE set to 16x32? - is DISPC_GFX_ATTRIBUTES.GFXFIFOPRELOAD set to 1? - is DISPC_GFX_PRELOAD.PRELOAD set to the maximum possible value? - is DISPC_CONFIG.FIFOFILLING set to 1? > So I think that the high thresholds of 3071 and 3008 are so close to > each other that there shouldn't be any real difference in practice, > presuming everything works. But, for whatever reason, fetching of the > pixels becomes much more inefficient or with much higher start latency, > causing the underflows. That's really weird. - Paul