On Thu, Sep 05, 2013 at 10:06:52PM -0700, John Stultz wrote: > On 09/05/2013 08:26 PM, Rob Clark wrote: > > On Thu, Sep 5, 2013 at 8:49 PM, John Stultz <john.stultz@xxxxxxxxxx> wrote: > >> Hey everyone, > >> In preparation for the Plumbers Android+Graphics microconf, I wanted to > >> send out some background documentation to try to get all the context we can > >> out there prior to the discussion, as time will be limited and it would be > >> best to spend it discussing solutions rather then re-hashing problems and > >> requirements. > >> > >> I'm sure many folks on this list could probably do a better job summarizing > >> the issues, but I wanted to get this out there to try to enumerate the > >> problems and the different perspectives on the issues that I'm aware of. > >> > >> The document is on LWN here: > >> http://lwn.net/SubscriberLink/565469/9d88daa2282ef6c2/ > > oh, I had missed that article.. fwiw > > It was published just moments before I sent out this thread, so I > wouldn't have expected anyone to have seen it yet. :) > > > > "Another possible solution is to allow dma-buf exporters to not > > allocate the backing buffers immediately. This would allow multiple > > drivers to attach to a dma-buf before the allocation occurs. Then, > > when the buffer is first used, the allocation is done; at that time, > > the allocator could scan the list of attached drivers and be able to > > determine the constraints of the attached devices and allocate memory > > accordingly. This would allow user space to not have to deal with any > > constraint solving. " > > > > That is actually how dma-buf works today. And at least with GEM > > buffers exported as dma-buf's, the allocation is deferred. It does > > require attaching the buffers in all the devices that will be sharing > > the buffer up front (but I suppose you need to know the involved > > devices one way or another with any solution, so this approach seems > > as good as any). We *do* still need to spiff up dev->dma_parms a bit > > more, and there might be some room for some helpers to figure out the > > union of all attached devices constraints, and allocate suitable > > backing pages... so perhaps this is one thing we should be talking > > about. > > Ok. I had gone looking for an example of the deferred allocation, but > didn't find it. I'll go look again, but if you have a pointer, that > could be useful. > > So yea, I do think this is the most promising approach, but sorting out > the next steps for doing a proof of concept is one thing I'd like to > discuss (as mentioned in the article, via a ion-like generic allocator, > or trying to wire in the constraint solving to some limited set of > drivers via generic helper functions). As well as getting a better > understanding the Android developers concern about any non-deterministic > cost of allocating at mmap time. > > > Thanks for the feedback and thoughts! I'm hopeful some approach to > resolving the various issues can be found, but I suspect it will have a > few different parts. My main gripe with ION is that it creates a parallel infrastructure for figuring out allocation constraints of devices. Upstream already has all the knowledge (or at least most of it) for cache flushing, mapping into iommus and allocating from special pools stored in association with the device structure. So imo an upstream ION thing should reuse the information each device and its driver already has available. Now I also see that a central allocator has upsides since reinventing this wheel for every device driver is not a great idea. One idea to get there and keep the benefits of ION with up-front allocations would be. 1) Allcoate the dma-buf handle at the central allocator. No backing storage gets allocated. 2) Import that dma-buf everywhere you want it to be used. That way userspace doesn't need to deal with whatever hw madness is actually used to implement the drm/v4l/whatever devices nodes internally. 3) Ask the central allocator to finalize the buffer allocation placement and grab backing storage. If any further attaching happens that doesn't work out it would simply fail, and userspace gets to keep the pieces. Which is already the case in today's upstream when userspace is unlucky and doesn't pick the most constrained device. This only tackles the "make memory allocation predictable" issue ION solves, which leaves the optimized cache flushing. We can add caches for pre-flushed objects for that (not rocket science, most of the drm drivers have that wheel reinvented, too). That leaves us with optimizing cache flushes (i.e. leaving them out when switching between devices without cpu accesss in-between). The current linux dma api doesn't really support this, so we need to add a bit of interfaces there to be able to do device-to-device cache flushing (which safe for maybe iommu flushes I expect to be noops). And the central allocator obviously needs to keep track of where the current cache domain is. Aside: Intel Atom SoCs have the same cache flushing challenges since all the gfx blocks (gpu, display, camera, ...) prefer direct main memory access that bypasses gpu caches. Big core stuff is obviously different and fully coherent. So we need a solution for this, too, but unfortunately the camera driver guys haven't yet managed to up stream their driver so not possible for us to demonstrate anything on upstream :( Same story as everywhere else in SoC-land I guess ... Now one thing I've missed from your article on the GEM vs. ION topic is that gem allows buffers to be swapped out. That works by allocating shmemfs nodes, but that doesn't really work together nicely with the current linux dma apis. Which means that drivers have a bunch of hacks to work around this (and ttm has an entire page cache as a 2nd allocation step to get at the right dma api allocated pages). There's been the occasional talk about a gemfs to rectify these allocation issues. If we'd merge this with the central allocator and optionally allow it to swap out/move backing storage pages (and also back them with a fs node ofc) then we could rip out a bit code from drm drivers. I also think that this way would be the only approach to actually make PRIME work together with IOMMUs. There's some really old patches from Chris Wilson to teach i915-gem to directly manage the backing storage swapping, so patching this into the central allocator shouldn't be too nefarious. So that's my rough sketch of the brave new world I have in mind. Please poke holes ;-) Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel