On Fri, Sep 6, 2013 at 5:16 AM, Daniel Vetter <daniel@xxxxxxxx> wrote: > On Thu, Sep 05, 2013 at 10:06:52PM -0700, John Stultz wrote: >> On 09/05/2013 08:26 PM, Rob Clark wrote: >> > On Thu, Sep 5, 2013 at 8:49 PM, John Stultz <john.stultz@xxxxxxxxxx> wrote: >> >> Hey everyone, >> >> In preparation for the Plumbers Android+Graphics microconf, I wanted to >> >> send out some background documentation to try to get all the context we can >> >> out there prior to the discussion, as time will be limited and it would be >> >> best to spend it discussing solutions rather then re-hashing problems and >> >> requirements. >> >> >> >> I'm sure many folks on this list could probably do a better job summarizing >> >> the issues, but I wanted to get this out there to try to enumerate the >> >> problems and the different perspectives on the issues that I'm aware of. >> >> >> >> The document is on LWN here: >> >> http://lwn.net/SubscriberLink/565469/9d88daa2282ef6c2/ >> > oh, I had missed that article.. fwiw >> >> It was published just moments before I sent out this thread, so I >> wouldn't have expected anyone to have seen it yet. :) >> >> >> > "Another possible solution is to allow dma-buf exporters to not >> > allocate the backing buffers immediately. This would allow multiple >> > drivers to attach to a dma-buf before the allocation occurs. Then, >> > when the buffer is first used, the allocation is done; at that time, >> > the allocator could scan the list of attached drivers and be able to >> > determine the constraints of the attached devices and allocate memory >> > accordingly. This would allow user space to not have to deal with any >> > constraint solving. " >> > >> > That is actually how dma-buf works today. And at least with GEM >> > buffers exported as dma-buf's, the allocation is deferred. It does >> > require attaching the buffers in all the devices that will be sharing >> > the buffer up front (but I suppose you need to know the involved >> > devices one way or another with any solution, so this approach seems >> > as good as any). We *do* still need to spiff up dev->dma_parms a bit >> > more, and there might be some room for some helpers to figure out the >> > union of all attached devices constraints, and allocate suitable >> > backing pages... so perhaps this is one thing we should be talking >> > about. >> >> Ok. I had gone looking for an example of the deferred allocation, but >> didn't find it. I'll go look again, but if you have a pointer, that >> could be useful. >> >> So yea, I do think this is the most promising approach, but sorting out >> the next steps for doing a proof of concept is one thing I'd like to >> discuss (as mentioned in the article, via a ion-like generic allocator, >> or trying to wire in the constraint solving to some limited set of >> drivers via generic helper functions). As well as getting a better >> understanding the Android developers concern about any non-deterministic >> cost of allocating at mmap time. >> >> >> Thanks for the feedback and thoughts! I'm hopeful some approach to >> resolving the various issues can be found, but I suspect it will have a >> few different parts. > > My main gripe with ION is that it creates a parallel infrastructure for > figuring out allocation constraints of devices. Upstream already has all > the knowledge (or at least most of it) for cache flushing, mapping into > iommus and allocating from special pools stored in association with the > device structure. So imo an upstream ION thing should reuse the > information each device and its driver already has available. yeah, we want to make sure that dma-mapping is up to snuff for handling allocations of backing pages meeting the constraints of a set of devices (spiffing up dma_parms, etc, as I mentioned in my first reply). I see a potential upstream ION as just be a sort of convenience wrapper for android userspace rather than an actual allocator of backing pages, etc. Well, maybe some of this is easier to do in userspace/gralloc, but for example to ease "jank" fears, it could pre-attach to all the involved devices for the use-case, and then do a dummy map_attachment to the ION device to force backing page allocation. BR, -R > Now I also see that a central allocator has upsides since reinventing this > wheel for every device driver is not a great idea. One idea to get there > and keep the benefits of ION with up-front allocations would be. > 1) Allcoate the dma-buf handle at the central allocator. No backing > storage gets allocated. > 2) Import that dma-buf everywhere you want it to be used. That way > userspace doesn't need to deal with whatever hw madness is actually used > to implement the drm/v4l/whatever devices nodes internally. > 3) Ask the central allocator to finalize the buffer allocation placement > and grab backing storage. > > If any further attaching happens that doesn't work out it would simply > fail, and userspace gets to keep the pieces. Which is already the case in > today's upstream when userspace is unlucky and doesn't pick the most > constrained device. > > This only tackles the "make memory allocation predictable" issue ION > solves, which leaves the optimized cache flushing. We can add caches for > pre-flushed objects for that (not rocket science, most of the drm drivers > have that wheel reinvented, too). That leaves us with optimizing cache > flushes (i.e. leaving them out when switching between devices without cpu > accesss in-between). The current linux dma api doesn't really support > this, so we need to add a bit of interfaces there to be able to do > device-to-device cache flushing (which safe for maybe iommu flushes I > expect to be noops). And the central allocator obviously needs to keep > track of where the current cache domain is. > > Aside: Intel Atom SoCs have the same cache flushing challenges since all > the gfx blocks (gpu, display, camera, ...) prefer direct main memory > access that bypasses gpu caches. Big core stuff is obviously different and > fully coherent. So we need a solution for this, too, but unfortunately the > camera driver guys haven't yet managed to up stream their driver so not > possible for us to demonstrate anything on upstream :( Same story as > everywhere else in SoC-land I guess ... > > Now one thing I've missed from your article on the GEM vs. ION topic is > that gem allows buffers to be swapped out. That works by allocating > shmemfs nodes, but that doesn't really work together nicely with the > current linux dma apis. Which means that drivers have a bunch of hacks to > work around this (and ttm has an entire page cache as a 2nd allocation > step to get at the right dma api allocated pages). > > There's been the occasional talk about a gemfs to rectify these allocation > issues. If we'd merge this with the central allocator and optionally allow > it to swap out/move backing storage pages (and also back them with a fs > node ofc) then we could rip out a bit code from drm drivers. I also think > that this way would be the only approach to actually make PRIME work > together with IOMMUs. There's some really old patches from Chris Wilson to > teach i915-gem to directly manage the backing storage swapping, so > patching this into the central allocator shouldn't be too nefarious. > > So that's my rough sketch of the brave new world I have in mind. Please > poke holes ;-) > > Cheers, Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel