On Tue, Feb 2, 2021 at 6:04 AM Daniel Vetter <daniel@xxxxxxxx> wrote: > > On Fri, Jan 22, 2021 at 05:28:32PM -0800, John Stultz wrote: > > On Mon, Dec 21, 2020 at 2:09 PM Daniel Vetter <daniel@xxxxxxxx> wrote: > > > > > > On Fri, Dec 18, 2020 at 05:16:56PM -0800, John Stultz wrote: > > > > On Fri, Dec 18, 2020 at 6:36 AM Daniel Vetter <daniel@xxxxxxxx> wrote: > > > > > On Thu, Dec 17, 2020 at 11:06:11PM +0000, John Stultz wrote: > > > > > > Reuse/abuse the pagepool code from the network code to speed > > > > > > up allocation performance. > > > > > > > > > > > > This is similar to the ION pagepool usage, but tries to > > > > > > utilize generic code instead of a custom implementation. > > > > > > > > > > We also have one of these in ttm. I think we should have at most one of > > > > > these for the gpu ecosystem overall, maybe as a helper that can be plugged > > > > > into all the places. > > > > > > > > > > Or I'm kinda missing something, which could be since I only glanced at > > > > > yours for a bit. But it's also called page pool for buffer allocations, > > > > > and I don't think there's that many ways to implement that really :-) > > > > > > > > Yea, when I was looking around the ttm one didn't seem quite as > > > > generic as the networking one, which more easily fit in here. > > > > > > Oops, I didn't look that closely and didn't realize you're reusing the one > > > from net/core/. > > > > > > > The main benefit for the system heap is not so much the pool itself > > > > (the normal page allocator is pretty good), as it being able to defer > > > > the free and zero the pages in a background thread, so the pool is > > > > effectively filled with pre-zeroed pages. > > > > > > > > But I'll take another look at the ttm implementation and see if it can > > > > be re-used or the shared code refactored and pulled out somehow. > > > > > > I think moving the page_pool from net into lib and using it in ttm might > > > also be an option. Lack of shrinker in the networking one might be a bit a > > > problem. > > > > Yea. I've been looking at this, to see how to abstract out a generic > > pool implementation, but each pool implementation has been tweaked for > > the specific use cases, so a general abstraction is a bit tough right > > off. > > > > For example the ttm pool's handling allocations both from alloc_pages > > and dma_alloc in a pool, where the net page pool only uses alloc_pages > > (but can pre map via dma_map_attr). > > > > And as you mentioned, the networking page pool is statically sized > > where the ttm pool is dynamic and shrinker controlled. > > > > Further, as the ttm pool is utilized for keeping pools of pages set > > for specific cache types, it makes it difficult to abstract that out > > as we have to be able to reset the caching (set_pages_wb()) when > > shrinking, so that would also have to be pushed down into the pool > > attributes as well. > > > > So far, in my attempts to share an abstraction for both the net > > page_pool and the ttm page pool, it seems to make the code complexity > > worse on both sides - so while I'm interested in continuing to try to > > find a way to share code here, I'm not sure it makes sense to hold up > > this series (which is already re-using an existing implementation and > > provide a performance bump in microbenchmarks) for the > > grand-unified-page-pool. Efforts to refactor the ttm pool and net page > > pool can continue on indepently, and I'd be happy to move the system > > heap to whatever that ends up being. > > The thing is, I'm not sure sharing code with net/core is a really good > idea, at least it seems like we have some impendence mismatch with the ttm > pool. And going forward I expect sooner or later we need alignment between > the pools/caches under drm with dma-buf heap pools a lot more than between > dma-buf and net/core. I mean... I don't think you're wrong here, but it was your suggestion. > So this feels like a bit code sharing for code sharing's sake and not > where it makes sense. Expecting net/core and gpu stacks to have the exact > same needs for a page pool allocator has good chances to bite us in the > long run. Again, I agree with you at the high level here (dmabuf system heap and ttm page pooling are conceptually more likely to align, and duplication of buffer pools is non-optimal), but there's still the practical aspect of the ttm pool being pretty tied to the ttm code (utilizing ttm contexts, fixed MAX_ORDER*TTM_NUM_CACHING_TYPES subpools per pool + 4 global sub-pools for only x86). So... I guess I'll go for another pass at trying to pull something generic out of the ttm_pool, but the cynic in me suspects folks will just object to any inefficiencies added in order to do so (the code-sharing for its own sake argument above) and I'll be back to where I am now. But we'll see. thanks -john