On Fri, Sep 14, 2012 at 10:48 AM, Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> wrote: > On Fri, Sep 14, 2012 at 09:45:18AM -0500, Rob Clark wrote: >> On Fri, Sep 14, 2012 at 8:58 AM, Ville Syrjälä >> <ville.syrjala@xxxxxxxxxxxxxxx> wrote: >> > On Fri, Sep 14, 2012 at 08:25:53AM -0500, Rob Clark wrote: >> >> On Fri, Sep 14, 2012 at 7:50 AM, Ville Syrjälä >> >> <ville.syrjala@xxxxxxxxxxxxxxx> wrote: >> >> > On Thu, Sep 13, 2012 at 11:35:59AM -0500, Rob Clark wrote: >> >> >> On Thu, Sep 13, 2012 at 9:29 AM, Ville Syrjälä >> >> >> <ville.syrjala@xxxxxxxxxxxxxxx> wrote: >> >> >> > On Thu, Sep 13, 2012 at 08:39:54AM -0500, Rob Clark wrote: >> >> >> >> On Thu, Sep 13, 2012 at 3:40 AM, Ville Syrjälä >> >> >> >> <ville.syrjala@xxxxxxxxxxxxxxx> wrote: >> >> [snip] >> >> >> >> > >> >> >> >> > I would say this is going to be the most common use case if you consider >> >> >> >> > just the number of shipping devices. It's pretty much what every Android >> >> >> >> > phone/tablet with a HDMI port has to do. >> >> >> >> >> >> >> >> bleh, surfaceflinger kinda sucks then.. >> >> >> > >> >> >> > Why? This use case is not enforced by surfaceflinger, it's just the use >> >> >> > case most devices would have. >> >> >> > >> >> >> > I don't think there's anything wrong with the way surfaceflinger is designed >> >> >> > with the prepare and commit phases. How else would you do it? >> >> >> >> >> >> well, maybe I misunderstood how surfaceflinger works, but it sounded >> >> >> like it has one prepare/commit phase across outputs, vs what weston >> >> >> compositor does where each output is rendered and flipped >> >> >> independently at the rate of that particular output. If the two >> >> >> outputs just happen to be vsync aligned, you would end up flipping at >> >> >> the same time, but if the are not locked you don't have any artificial >> >> >> constraint in the rendering/flipping. >> >> > >> >> > OK so it's purely a pull based model, whereas surfaceflinger is more >> >> > push based. >> >> > >> >> > I suppose it might be possible to make surfaceflinger support a pull >> >> > model by driving the compositor loop through a combined signal from >> >> > multiple outputs. But IIRC it did have some timing related code in >> >> > there somewhere, so it might not be happy about it. It might also >> >> >> >> As I understood, at least in older versions android versions, >> >> rendering was based on a timer as there was no vblank event to >> >> userspace on most SoC platforms (which sounds strange, but so far most >> >> SoC's are using fbdev and/or crazy hacks rather than drm/kms) >> >> >> >> not sure if the timer is still there.. but I hope it goes away, it is >> >> really a horrible way to keep track of vsync >> > >> > I've only looked at ICS in any detail. At least there we used the page >> > flip event from one display to set the pace of the compositor loop. >> > IIRC JB is supposed to have some vsync related changes, but I haven't >> > looked at the code. >> > >> >> > affect the clients' rendering speed since the compositor would be >> >> > pulling their buffers from queue at non-constant speed. I don't >> >> > remember the details of the buffer management very well, so I can't be >> >> > sure though. But I probably wouldn't bother trying this, since the >> >> > straightforward approach is so simple, and the results are reasonably >> >> > good. >> >> > >> >> > The pull model does seem more flexible. But it does require a bit of >> >> > extra complexity in the compositor to avoid compositing the same scene >> >> > multiple times needlessly when multiple cloned displays are involved. >> >> > I suppose ideally you'd want to recompose for each display to minimize >> >> > visible latency, but from power usage POV it may not be a good idea. >> >> >> >> fwiw, weston is already being pretty clever about keeping track of >> >> damage and minimizing the area of the screen that must be re-rendered. >> >> I'm not sure if SF does anything like this. >> > >> > IIRC it can do that, but the EGL implementation needs to support >> > EGL_BUFFER_PRESERVED. >> > >> > I suppose the best way to implement EGL_BUFFER_PRESERVED with >> > page flips would be to schedule the flip and immediately perform >> > a blit from the new front buffer to the new back buffer. Well, >> > unless the hardware has some more clever mechanism for it. >> > >> > Does weston depend on preserved flips too, or can it even track >> > damage independently for each buffer? >> >> well, weston knows how many buffers are at play. So it takes the >> union of the damage from the last time the buffer was used (well, >> currently it assumes only double buffered) and the new damage. > > With more buffer it'll get a bit more complicate as it needs to keep > accumulating the damage for all buffers. But it should still be fairly > trivial when you're in full control of the buffers. well, just track previous damage per buffer.. but yeah, slightly more complicated >> This >> way it avoids need for the gl driver, which doesn't know as well what >> is going on as the app, from needing to do a back-blit. It can do >> this because w/ drm/gbm egl winsys, eglSwapBuffers() doesn't actually >> swap the buffers on the display and weston is in charge of which >> buffer is displayed or rendered. Weston explicitly calls page flip >> ioctl. The good news being that it can atomically flip overlay layers >> at the same time once the new ioctl is in place. > > Yeah, with EGL in the mix, as can be the case with Android, the layering > can start to work against you a little bit. Well, it's not too bad based > on my experience though. > >> Maybe it is useful to look at http://github.com/robclark/kmscube .. it >> doesn't actually use planes, but shows the interaction of egl and kms. >> Maybe I should enhance it w/ multiple rotating cubes on different >> overlays. ;-) > > When doing the Medfield work, I had a test case which utilized the > video overlays and the primary plane. What it did was draw ugly > colored rectangles on the primary layer, and positioned the overlays > to cover those up exactly. When the atomic page flip system was > operating as intended you could only see a solid color on the screen. > It randomly changed the position and size of the rectangles and > overlays. I really need to dig that up modify it to work with my > current code. it would be nice to resurrect this using drm/gbm stuff so we can use it as test code on multiple platform. (I've tested kmscube on i915 and omap) >> >> >> note that the test phase doesn't need vblank events, and also >> >> >> shouldn't -EBUSY if there is still a pending flip[*], >> >> > >> >> > Right. Personally I'm not a fan of the EBUSY behaviour at all. Seems >> >> > a bit pointless since user space can take care of it via the event >> >> > mechanism. But I suppose you want it for omap so that you can avoid >> >> > having to write software workarounds to overcome the GO bit >> >> > limitations. >> >> >> >> I the main issue is disconnecting an overlay from one crtc and >> >> connecting to another.. I would expect that any hw which can connect >> >> an ovl to more than one possible crtc would have the same limit (ie. >> >> have to wait until scanout on previous crtc completes), so I think >> >> EBUSY is a good way to indicate to userspace that the requested >> >> configuration is not possible *now* but would be possible in the >> >> future. >> > >> > Intel HW can do the transition automagically, but if you try to >> > combine it with other page flips, the driver would have to perform some >> > gynmastics to make things appear atomic. Of course if you'd try to swap >> > overlay A from pipe 1 to pipe 2, and overlay B from pipe 2 to pipe 1 at >> > the same time, there's just no way to do that without sacrificing >> > atomicity on one of the pipes. >> > >> > So even with such HW, it's probably easier to forget about the feature, >> > and require user space to perform the disable+enable sequence in two steps. >> >> true, but I don't want to block the disable until vblank w/ >> atomic-pageflip, and if userspace re-enables the plane on a different >> crtc before the next vblank, it would be useful for the driver to have >> a way to say 'try again later'. > > Yeah, trying to do it all asynchronously in the kernel would perhaps > be too much work for little gain. > > I wonder if you've though about omap's FIFO merge. It can cause similar > issues, that is some operations may need two vblanks to complete. And it > looks like I'll get to worry about this stuff too since there are some > watermark related wait_for_vblank() workarounds in the IVB sprite code, > sigh. yeah, FIFO merge is a nice big headache.. and not really ideal for latency unless you have some advanced warning to disable FIFO merge before userspace wants to switch on an extra overlay. I think the best way to deal is just start switching off FIFO merge when userspace first does test w/ overlay, but return EBUSY. It means we'll use the gpu for rendering for one frame, but I think that is better than blocking the compositor for a vblank or two. Thou shalt not block the compositor. >> And if we do support multiple crtc's w/ pageflip, I'm not sure if >> there is a good way to enforce two-steps. Having a standardized way >> to tell userspace to try later seems like a good thing. > > Sure, for that it seems reasonable. > >> >> >> >> Also, if you pageflip on multiple CRTC's, should the be multiple >> >> >> >> vblank events, and multiple userdata's? >> >> >> > >> >> >> > That's a bit of an open question. I was considering several options: >> >> >> >> >> >> the thing I like about one ioctl per crtc is that it avoids this whole >> >> >> question.. >> >> >> >> >> >> And, I think as long as you have to update multiple different scanout >> >> >> address registers, there is always going to be a race in multi-crtc >> >> >> flipping. Having a single ioctl does make the race smaller. I'm not >> >> >> sure how important that point is. >> >> > >> >> > Which race? >> >> >> >> ie. if you set REG_CRTC1_ADDR just immediately before vblank and >> >> REG_CRTC2_ADDR just after >> > >> > Well, with unsynced crtcs I wouldn't call that any kind of meaningful race. >> > The same problem after all exists even with a single crtc. You either make >> > the deadline and write the register before vblank, or you don't make it >> > and end up with a repeated frame. >> >> I meant w/ sync'd crtc's, there is still no 100% guarantee that the >> two flip at the same time. > > Sure there is. That's what the vblank evade stuff gives you. I just > happen to need it even when using just one crtc because the hardware > doesn't have the necessary mechanism to flip several planes atomically. hmm, I guess I don't quite follow then. But I guess I don't know the intel hw well enough. It seemed like you weren't atomically updating scanout registers. But anyways, I think it is probably ok to not need the crtc up-front. We can catch issues w/ pending vblank at the atomic_test() stage. Still not sure what to do about userdata. Although I suppose we could make userdata a property attached to crtc and/or plane and that gives userspace plenty of flexibility about how many events it wants back. (Ie. no event if userdata==0.. or maybe separate send-event property.) BR, -R > -- > Ville Syrjälä > Intel OTC > _______________________________________________ > dri-devel mailing list > dri-devel@xxxxxxxxxxxxxxxxxxxxx > http://lists.freedesktop.org/mailman/listinfo/dri-devel _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel