Re: [PATCH 1/2] drm/ttm: rework ttm_tt page limit v2

Christian König <christian.koenig@xxxxxxx> · Thu, 17 Dec 2020 16:35:55 +0100

Am 17.12.20 um 16:26 schrieb Daniel Vetter:
On Thu, Dec 17, 2020 at 4:10 PM Christian König
<christian.koenig@xxxxxxx> wrote:
Am 17.12.20 um 15:36 schrieb Daniel Vetter:
On Thu, Dec 17, 2020 at 2:46 PM Christian König
<ckoenig.leichtzumerken@xxxxxxxxx> wrote:
Am 16.12.20 um 16:09 schrieb Daniel Vetter:
On Wed, Dec 16, 2020 at 03:04:26PM +0100, Christian König wrote:
[SNIP]
+
+/* As long as pages are available make sure to release at least one */
+static unsigned long ttm_tt_shrinker_scan(struct shrinker *shrink,
+                                      struct shrink_control *sc)
+{
+    struct ttm_operation_ctx ctx = {
+            .no_wait_gpu = true
Iirc there's an eventual shrinker limit where it gets desperate. I think
once we hit that, we should allow gpu waits. But it's not passed to
shrinkers for reasons, so maybe we should have a second round that tries
to more actively shrink objects if we fell substantially short of what
reclaim expected us to do?
I think we should try to avoid waiting for the GPU in the shrinker callback.

When we get HMM we will have cases where the shrinker is called from
there and we can't wait for the GPU then without causing deadlocks.
Uh that doesn't work. Also, the current rules are that you are allowed
to call dma_fence_wait from shrinker callbacks, so that shipped sailed
already. This is because shrinkers are a less restrictive context than
mmu notifier invalidation, and we wait in there too.

So if you can't wait in shrinkers, you also can't wait in mmu
notifiers (and also not in HMM, wĥich is the same thing). Why do you
need this?
The core concept of HMM is that pages are faulted in on demand and it is
perfectly valid for one of those pages to be on disk.

So when a page fault happens we might need to be able to allocate memory
and fetch something from disk to handle that.

When this memory allocation then in turn waits for the GPU which is
running the HMM process we are pretty much busted.
Yeah you can't do that. That's the entire infinite fences discussions.

Yes, exactly.

For HMM to work, we need to stop using dma_fence for userspace sync,

I was considering of separating that into a dma_fence and a hmm_fence. 
Or something like this.

and you can only use the amdkfd style preempt fences. And preempting
while the pagefault is pending is I thought something we require.

Yeah, problem is that most hardware can't do that :)

Getting page faults to work is hard enough, preempting while waiting for 
a fault to return is not something which was anticipated :)

Iow, the HMM page fault handler must not be a dma-fence critical
section, i.e. it's not allowed to hold up any dma_fence, ever.

What do you mean with that?

One consequence of this is that you can use HMM for compute, but until
we've revamped all the linux winsys layers, not for gl/vk. Or at least
I'm not seeing how.

Also like I said, dma_fence_wait is already allowed in mmu notifiers,
so we've already locked down these semantics even more. Due to the
nesting of gfp allocation contexts allowing dma_fence_wait in mmu
notifiers (i.e. __GFP_ALLOW_RECLAIM or whatever the flag is exactly)
implies it's allowed in shrinkers. And only if you forbid it from from
all allocations contexts (which makes all buffer object managed gpu
memory essentially pinned, exactly what you're trying to lift here) do
you get what you want.

The other option is to make HMM and dma-buf completely disjoint worlds
with no overlap, and gang scheduling on the gpu (to guarantee that
there's never any dma_fence in pending state while an HMM task might
cause a fault).

[SNIP]
So where do you want to recurse here?
I wasn't aware that without __GFP_FS shrinkers are not called.
Maybe double check, but that's at least my understanding. GFP flags
are flags, but in reality it's a strictly nesting hierarchy:
GFP_KERNEL > GFP_NOFS > GFP_NOIO > GFP_RELCAIM > GFP_ATOMIC (ok atomic
is special, since it's allowed to dip into emergency reserve).

Going to read myself into that over the holidays.

Christian.
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel