On Wed, Mar 24, 2021 at 01:00:28PM +0100, Christian König wrote: > Am 24.03.21 um 12:55 schrieb Daniel Vetter: > > On Wed, Mar 24, 2021 at 11:19:13AM +0100, Thomas Hellström (Intel) wrote: > > > On 3/23/21 4:45 PM, Christian König wrote: > > > > Am 23.03.21 um 16:13 schrieb Michal Hocko: > > > > > On Tue 23-03-21 14:56:54, Christian König wrote: > > > > > > Am 23.03.21 um 14:41 schrieb Michal Hocko: > > > > > [...] > > > > > > > Anyway, I am wondering whether the overall approach is > > > > > > > sound. Why don't > > > > > > > you simply use shmem as your backing storage from the > > > > > > > beginning and pin > > > > > > > those pages if they are used by the device? > > > > > > Yeah, that is exactly what the Intel guys are doing for their > > > > > > integrated > > > > > > GPUs :) > > > > > > > > > > > > Problem is for TTM I need to be able to handle dGPUs and those have all > > > > > > kinds of funny allocation restrictions. In other words I need to > > > > > > guarantee > > > > > > that the allocated memory is coherent accessible to the GPU > > > > > > without using > > > > > > SWIOTLB. > > > > > > > > > > > > The simple case is that the device can only do DMA32, but you also got > > > > > > device which can only do 40bits or 48bits. > > > > > > > > > > > > On top of that you also got AGP, CMA and stuff like CPU cache behavior > > > > > > changes (write back vs. write through, vs. uncached). > > > > > OK, so the underlying problem seems to be that gfp mask (thus > > > > > mapping_gfp_mask) cannot really reflect your requirements, right? Would > > > > > it help if shmem would allow to provide an allocation callback to > > > > > override alloc_page_vma which is used currently? I am pretty sure there > > > > > will be more to handle but going through shmem for the whole life time > > > > > is just so much easier to reason about than some tricks to abuse shmem > > > > > just for the swapout path. > > > > Well it's a start, but the pages can have special CPU cache settings. So > > > > direct IO from/to them usually doesn't work as expected. > > > > > > > > Additional to that for AGP and CMA I need to make sure that I give those > > > > pages back to the relevant subsystems instead of just dropping the page > > > > reference. > > > > > > > > So I would need to block for the swapio to be completed. > > > > > > > > Anyway I probably need to revert those patches for now since this isn't > > > > working as we hoped it would. > > > > > > > > Thanks for the explanation how stuff works here. > > > Another alternative here that I've tried before without being successful > > > would perhaps be to drop shmem completely and, if it's a normal page (no dma > > > or funny caching attributes) just use add_to_swap_cache()? If it's something > > > else, try alloc a page with relevant gfp attributes, copy and > > > add_to_swap_cache()? Or perhaps that doesn't work well from a shrinker > > > either? > > So before we toss everything and go an a great rewrite-the-world tour, > > what if we just try to split up big objects. So for objects which are > > bigger than e.g. 10mb > > > > - move them to a special "under eviction" list > > - keep a note how far we evicted thus far > > - interleave allocating shmem pages, copying data and releasing the ttm > > backing store on a chunk basis (maybe 10mb or whatever, tuning tbh) > > > > If that's not enough, occasionally break out of the shrinker entirely so > > other parts of reclaim can reclaim the shmem stuff. But just releasing our > > own pages as we go should help a lot I think. > > Yeah, the later is exactly what I was currently prototyping. > > I just didn't used a limit but rather a only partially evicted BOs list > which is used when we fail to allocate a page. > > For the 5.12 cycle I think we should just go back to a hard 50% limit for > now and then resurrect this when we have solved the issues. Can we do the 50% limit without tossing out all the code we've done thus far? Just so this doesn't get too disruptive. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch