On Wed, Jun 08, 2022 at 04:48:47PM +0200, Maxime Ripard wrote: > On Wed, Jun 01, 2022 at 02:35:35PM +0200, Daniel Vetter wrote: > > On Tue, May 31, 2022 at 10:58:35AM +0200, Maxime Ripard wrote: > > > Hi Daniel, > > > > > > Thanks for your feedback > > > > > > On Wed, May 25, 2022 at 07:18:07PM +0200, Daniel Vetter wrote: > > > > > > VBLANK Events and Asynchronous Commits > > > > > > ====================================== > > > > > > When should the VBLANK event complete? When the pixels have been blitted > > > > > > to the kernel's shadow buffer? When the first frame of the waveform is > > > > > > sent to the panel? When the last frame is sent to the panel? > > > > > > > > > > > > Currently, the driver is taking the first option, letting > > > > > > drm_atomic_helper_fake_vblank() send the VBLANK event without waiting og > > > > > > the refresh thread. This is the only way I was able to get good > > > > > > performance with existing userspace. > > > > > > > > > > I've been having the same kind of discussions in private lately, so I'm > > > > > interested by the answer as well :) > > > > > > > > > > It would be worth looking into the SPI/I2C panels for this, since it's > > > > > basically the same case. > > > > > > > > So it's maybe a bit misnamed and maybe kerneldocs aren't super clear (pls > > > > help improve them), but there's two modes: > > > > > > > > - drivers which have vblank, which might be somewhat variable (VRR) or > > > > become simulated (self-refresh panels), but otherwise is a more-or-less > > > > regular clock. For this case the atomic commit event must match the > > > > vblank events exactly (frame count and timestamp) > > > > > > Part of my interrogation there is do we have any kind of expectation > > > on whether or not, when we commit, the next vblank is going to be the > > > one matching that commit or we're allowed to defer it by an arbitrary > > > number of frames (provided that the frame count and timestamps are > > > correct) ? > > > > In general yes, but there's no guarantee. The only guarante we give for > > drivers with vblank counters is that if you receive a vblank event (flip > > complete or vblank event) for frame #n, then an immediate flip/atomic > > ioctl call will display earliest for frame #n+1. > > > > Also usually you should be able to hit #n+1, but even today with fun stuff > > like self refresh panels getting out of self refresh mode might take a bit > > more than a few frames, and so you might end up being late. But otoh if > > you just do a page flip loop then on average (after the crtc is fully > > resumed) you should be able to update at vrefresh rate exactly. > > I had more the next item in mind there: if we were to write something in > the kernel that would transparently behave like a full-blown KMS driver, > but would pipe the commits through a KMS writeback driver before sending > them to our SPI panel, we would always be at best two vblanks late. > > So this would mean that userspace would do a page flip, get a first > vblank, but the actual vblank for that commit would be the next one (at > best), consistently. > > > > > - drivers which don't have vblank at all, mostly these are i2c/spi panels > > > > or virtual hw and stuff like that. In this case the event simply happens > > > > when the driver is done with refresh/upload, and the frame count should > > > > be zero (since it's meaningless). > > > > > > > > Unfortuantely the helper to dtrt has fake_vblank in it's name, maybe > > > > should be renamed to no_vblank or so (the various flags that control it > > > > are a bit better named). > > > > > > > > Again the docs should explain it all, but maybe we should clarify them or > > > > perhaps rename that helper to be more meaningful. > > > > > > > > > > Blitting/Blending in Software > > > > > > ============================= > > > > > > There are multiple layers to this topic (pun slightly intended): > > > > > > 1) Today's userspace does not expect a grayscale framebuffer. > > > > > > Currently, the driver advertises XRGB8888 and converts to Y4 > > > > > > in software. This seems to match other drivers (e.g. repaper). > > > > > > > > > > > > 2) Ignoring what userspace "wants", the closest existing format is > > > > > > DRM_FORMAT_R8. Geert sent a series[4] adding DRM_FORMAT_R1 through > > > > > > DRM_FORMAT_R4 (patch 9), which I believe are the "correct" formats > > > > > > to use. > > > > > > > > > > > > 3) The RK356x SoCs have an "RGA" hardware block that can do the > > > > > > RGB-to-grayscale conversion, and also RGB-to-dithered-monochrome > > > > > > which is needed for animation/video. Currently this is exposed with > > > > > > a V4L2 platform driver. Can this be inserted into the pipeline in a > > > > > > way that is transparent to userspace? Or must some userspace library > > > > > > be responsible for setting up the RGA => EBC pipeline? > > > > > > > > > > I'm very interested in this answer as well :) > > > > > > > > > > I think the current consensus is that it's up to userspace to set this > > > > > up though. > > > > > > > > Yeah I think v4l mem2mem device is the answer for these, and then > > > > userspace gets to set it all up. > > > > > > I think the question wasn't really about where that driver should be, > > > but more about who gets to set it up, and if the kernel could have > > > some component to expose the formats supported by the converter, but > > > whenever a commit is being done pipe that to the v4l2 device before > > > doing a page flip. > > > > > > We have a similar use-case for the RaspberryPi where the hardware > > > codec will produce a framebuffer format that isn't standard. That > > > format is understood by the display pipeline, and it can do > > > writeback. > > > > > > However, some people are using a separate display (like a SPI display > > > supported by tinydrm) and we would still like to be able to output the > > > decoded frames there. > > > > > > Is there some way we could plumb things to "route" that buffer through > > > the writeback engine to perform a format conversion before sending it > > > over to the SPI display automatically? > > > > Currently not transparently. Or at least no one has done that, and I'm not > > sure that's really a great idea. With big gpus all that stuff is done with > > separate command submission to the render side of things, and you can > > fully pipeline all that with in/out-fences. > > > > Doing that in the kms driver side in the kernel feels very wrong to me :-/ > > So I guess what you're saying is that there's a close to 0% chance of it > being accepted if we were to come up with such an architecture? Yup. I think the only exception is if you have a multi-region memory manager using ttm (or hand-rolled, but please don't), where we first have to move the buffer into the right region before it can be scanned out. And that's generally done with a copy engine, for performance reasons. But that copy engine is really just a very dumb (but fast!) memcpy, and doesn't do any format conversion or stride/orientation changes like a full-blown blitter engine (or mem2mem in v4l speak) can do. So if it's really just memory management then I think it's fine, but anything beyond that is a no imo. Now for an overall full-featured stack we clearly need that, and it would be great if there's some common userspace libraries for hosting such code. But thus far all attempts have fallen short :-/ Which I guess is another indicator that we really shouldn't try to solve this problem in a generic fashion, and hence really shouldn't try to solve it with magic behind the generic kms interface in the kernel. For even more context I do think my old "why is 2d so hard" blogpost rant still applies: https://blog.ffwll.ch/2018/08/no-2d-in-drm.html The "why no 2d api for the more limited problem of handling framebuffers" is really just a small, but not any less complex, subset of that bigger conundrum. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch