Quoting Matthew Auld (2020-07-06 20:06:38) > On Mon, 6 Jul 2020 at 07:19, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote: > > > > The GEM object is grossly overweight for the practicality of tracking > > large numbers of individual pages, yet it is currently our only > > abstraction for tracking DMA allocations. Since those allocations need > > to be reserved upfront before an operation, and that we need to break > > away from simple system memory, we need to ditch using plain struct page > > wrappers. > > > > In the process, we drop the WC mapping as we ended up clflushing > > everything anyway due to various issues across a wider range of > > platforms. Though in a future step, we need to drop the kmap_atomic > > approach which suggests we need to pre-map all the pages and keep them > > mapped. > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > <snip> The other thing I'm toying with is whether to keep the unused preallocs around on the ppgtt. The cost of the conservative allocations is decidedly high in CI [thanks to memdebug tracking and verifying], as each PD is itself a 4KiB table of pointers (as well as the 4KiB dma page). Outside of CI, the issue is not as pressing, and if a workload does not reach steady state quickly, then the extra allocations are just one of many worries. For the steady state, we benefit from not having surplus pages trapped in the ppgtt, as that is the danger of the caching, when should we trim it? [Previously we only allocated on demand, but keep a *small* number of WC pages around because converting a page to/from WC was expensive.] If there's a good answer for when we can/should free the surplus cache, it's probably worth pursuing. Or if we deem it worth to keep the cache limited to 15 entries [reusing a pagevec]. Overallocation is pita for having to preallocate; since we basically have to have at least 2 PD for each level + actual span. For every vma, even when bundling the insertions, as we don't know which entries will be used until much later. So we almost certainly overallocate 4 PD [16KiB system + 16KiB dma] for every single vma. Even a 15 entry stash will be quickly exhausted; oh well. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx