On Thu, Dec 17, 2020 at 4:36 PM Christian König <christian.koenig@xxxxxxx> wrote: > Am 17.12.20 um 16:26 schrieb Daniel Vetter: > > On Thu, Dec 17, 2020 at 4:10 PM Christian König > > <christian.koenig@xxxxxxx> wrote: > >> Am 17.12.20 um 15:36 schrieb Daniel Vetter: > >>> On Thu, Dec 17, 2020 at 2:46 PM Christian König > >>> <ckoenig.leichtzumerken@xxxxxxxxx> wrote: > >>>> Am 16.12.20 um 16:09 schrieb Daniel Vetter: > >>>>> On Wed, Dec 16, 2020 at 03:04:26PM +0100, Christian König wrote: > >>>>> [SNIP] > >>>>>> + > >>>>>> +/* As long as pages are available make sure to release at least one */ > >>>>>> +static unsigned long ttm_tt_shrinker_scan(struct shrinker *shrink, > >>>>>> + struct shrink_control *sc) > >>>>>> +{ > >>>>>> + struct ttm_operation_ctx ctx = { > >>>>>> + .no_wait_gpu = true > >>>>> Iirc there's an eventual shrinker limit where it gets desperate. I think > >>>>> once we hit that, we should allow gpu waits. But it's not passed to > >>>>> shrinkers for reasons, so maybe we should have a second round that tries > >>>>> to more actively shrink objects if we fell substantially short of what > >>>>> reclaim expected us to do? > >>>> I think we should try to avoid waiting for the GPU in the shrinker callback. > >>>> > >>>> When we get HMM we will have cases where the shrinker is called from > >>>> there and we can't wait for the GPU then without causing deadlocks. > >>> Uh that doesn't work. Also, the current rules are that you are allowed > >>> to call dma_fence_wait from shrinker callbacks, so that shipped sailed > >>> already. This is because shrinkers are a less restrictive context than > >>> mmu notifier invalidation, and we wait in there too. > >>> > >>> So if you can't wait in shrinkers, you also can't wait in mmu > >>> notifiers (and also not in HMM, wĥich is the same thing). Why do you > >>> need this? > >> The core concept of HMM is that pages are faulted in on demand and it is > >> perfectly valid for one of those pages to be on disk. > >> > >> So when a page fault happens we might need to be able to allocate memory > >> and fetch something from disk to handle that. > >> > >> When this memory allocation then in turn waits for the GPU which is > >> running the HMM process we are pretty much busted. > > Yeah you can't do that. That's the entire infinite fences discussions. > > Yes, exactly. > > > For HMM to work, we need to stop using dma_fence for userspace sync, > > I was considering of separating that into a dma_fence and a hmm_fence. > Or something like this. The trouble is that dma_fence it all its forms is uapi. And on gpus without page fault support dma_fence_wait is still required in allocation contexts. So creating a new kernel structure doesn't really solve anything I think, it needs entire new uapi completely decoupled from memory management. Last time we've done new uapi was probably modifiers, and that's still not rolled out years later. > > and you can only use the amdkfd style preempt fences. And preempting > > while the pagefault is pending is I thought something we require. > > Yeah, problem is that most hardware can't do that :) > > Getting page faults to work is hard enough, preempting while waiting for > a fault to return is not something which was anticipated :) Hm last summer in a thread you said you've blocked that because it doesn't work. I agreed, page fault without preempt is rather tough to make work. > > Iow, the HMM page fault handler must not be a dma-fence critical > > section, i.e. it's not allowed to hold up any dma_fence, ever. > > What do you mean with that? dma_fence_signalling_begin/end() annotations essentially, i.e. cross-release dependencies. Or the other way round, if you want to be able to allocate memory you have to guarantee that you're never holding up a dma_fence. -Daniel > > One consequence of this is that you can use HMM for compute, but until > > we've revamped all the linux winsys layers, not for gl/vk. Or at least > > I'm not seeing how. > > > > Also like I said, dma_fence_wait is already allowed in mmu notifiers, > > so we've already locked down these semantics even more. Due to the > > nesting of gfp allocation contexts allowing dma_fence_wait in mmu > > notifiers (i.e. __GFP_ALLOW_RECLAIM or whatever the flag is exactly) > > implies it's allowed in shrinkers. And only if you forbid it from from > > all allocations contexts (which makes all buffer object managed gpu > > memory essentially pinned, exactly what you're trying to lift here) do > > you get what you want. > > > > The other option is to make HMM and dma-buf completely disjoint worlds > > with no overlap, and gang scheduling on the gpu (to guarantee that > > there's never any dma_fence in pending state while an HMM task might > > cause a fault). > > > >> [SNIP] > >>> So where do you want to recurse here? > >> I wasn't aware that without __GFP_FS shrinkers are not called. > > Maybe double check, but that's at least my understanding. GFP flags > > are flags, but in reality it's a strictly nesting hierarchy: > > GFP_KERNEL > GFP_NOFS > GFP_NOIO > GFP_RELCAIM > GFP_ATOMIC (ok atomic > > is special, since it's allowed to dip into emergency reserve). > > Going to read myself into that over the holidays. > > Christian. -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel