On Wed, Apr 10, 2013 at 10:18 AM, Ville Syrj?l? <ville.syrjala at linux.intel.com> wrote: >> > > + dpfc_ctl |= (HSW_DPFC_CTL_FENCE_EN | obj->fence_reg); >> > >> > The CPU fence field must be written with 0. SNB/IVB could do with the >> > same fix. >> > >> >> Where did you get this restriction for HSW? > > BSpec. > >> Should we write 0 or not touch >> by removing lis line? > > Not sure. The spec doesn't say whether the "CPU fence enable" bit still > has some meaning or not. I guess the safe approach would be to set the > fence to 0 and only set the fence enable bit when we actually have a > fence. > > BTW the whole dpfc_ctl handing seems to a bit broken. You're doing it > via RMW, but you're not clearing the fields that you modify. So you > could be just ORing bits to whatever garbage the register had before. Ok, I guess it's time for me to write down the fbc master plan here ;-) So imo psr and fbc fit into the exact same category: Both safe a bit of power if the display contents doesn't change. And for both we need to know when contents has changed so that we can either disable it or through some hw magic update screen contents. So imo we can (and should) wire up all the psr magic into the same intel_enable/disable_fbc functions. Now on platforms thus far the power savings part of fbc worked usually rather well, real ugliness kicked in when trying to enable the hw magic to detect display contents changes. The hw supplies three pieces for that: - cpu fence to detect writes from the cpu through the gtt - blt fbc range - each time the blt touches anything in there fbc gets a note - render fbc range - same - pageflips send a signal to fbc/psr to update screen Those above three things are usually riddled with workarounds, tend to hang the box (or as in the case of the gen6 blitter) totally kill throughput when enabled. The only thing which seems to work is the pageflip stuff. So my proposal is that before we enable anything like fbc or psr by default, we can all this hw magic. Completely. And then we replace it with some nice software tracking. The main idea is that at least on modern desktop, screen updates always happen with pageflips. That means for X only one active crtc since X still doesn't have per-crtc pixmaps, but meh. Now for the pageflipping compositor model we don't need any of the fancy (and usually broken) hw magic to detect updates to the currently scanning out framebuffer - the compositor always renders to the backbuffer. So the idea is now to enable fbc/psr when pageflipping and disable it as soon as we detect a frontbuffer rendering event. Thanks to the current gem api we can do this pretty cheaply: - CPU writes to the frontbuffer call sw_finish_ioctl. I need to still check whether that only applies to cpu maps or also gtt maps. - GTT writes could easily be caught by unmapping userspace ptes (we have the infrastructure for that already), but I hope we can avoid that. - Currently we have some fragile code in our execbuf handler to detect render of the gpu. I think we can rip this all out and instead detect frontbuffer gpu rendering in the busy ioctl - that has been established as the canonical interface for userspace to ask the kernel to flush the frontbuffer. Aside: This semantics was not designed, but we've realized that this is what it is eventually ;-) Now once we have done the Big Rip and restricted fbc/psr enabling to pageflips and disabling to these few places there's one thing left: We need proper locking. My proposal is to add a new fbc_mutex which protects all the special fbc state. That way we can avoid the current ugly locking hell between kms and gem: Since due to modeset/dpms changes we need to update fbc state, we enable it in the kms pageflip code if possible, but gem events ditacte when we need to disable it again. But if the fbc state is protected by its own mutex which needs within the kms/gem locks we should be fine. Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch