Re: Need to support mixed memory mappings with HMM

Christian König <ckoenig.leichtzumerken@xxxxxxxxx> · Fri, 26 Mar 2021 16:03:16 +0100

Am 26.03.21 um 15:49 schrieb Felix Kuehling:
Am 2021-03-26 um 4:50 a.m. schrieb Christian König:

Am 25.03.21 um 17:28 schrieb Felix Kuehling:
Am 2021-03-25 um 12:22 p.m. schrieb Christian König:
My idea is to change the amdgpu_vm_update_mapping interface to use
some
high-bit in the pages_addr array to indicate the page type. For
the
default page type (0) nothing really changes for the callers. The
"flags" parameter needs to become a pointer to an array that gets
indexed by the high bits from the pages_addr array. For existing
callers
it's as easy as changing flags to &flags (array of size 1). For
HMM we
would pass a pointer to a real array.
Yeah, I've thought about stuff like that as well for a while.

Problem is that this won't work that easily. We assume at many
places
that the flags doesn't change for the range in question.
I think some lower level functions assume that the flags stay the
same
for physically contiguous ranges. But if you use the high-bits to
encode
the page type, the ranges won't be contiguous any more. So you can
change page flags for different contiguous ranges.
Yeah, but then you also get absolutely zero THP and fragment flags
support.
As long as you have a contiguous 2MB page with the same page type, I
think you can still get a THP mapping in the GPU page table. But if
one
page in the middle of a 2MB page has a different page type, that will
break the THP mapping, as it should.
Yeah, but currently we detect that before we call down into the
functions to update the tables.

When you give a mixed list the chance for that is just completely gone.
Currently the detection of contiguous ranges is in
amdgpu_vm_update_mapping:

if (num_entries > AMDGPU_GPU_PAGES_IN_CPU_PAGE) { uint64_t pfn =
cursor.start >> PAGE_SHIFT; uint64_t count; contiguous =
pages_addr[pfn + 1] == pages_addr[pfn] + PAGE_SIZE; tmp = num_entries
/ AMDGPU_GPU_PAGES_IN_CPU_PAGE; for (count = 2; count < tmp; ++count)
{ uint64_t idx = pfn + count; if (contiguous != (pages_addr[idx] ==
pages_addr[idx - 1] + PAGE_SIZE)) break; } num_entries = count *
AMDGPU_GPU_PAGES_IN_CPU_PAGE; }
If a page type changes from one page to the next, it will end a
contiguous range and call into the lower level (amdgpu_vm_update_ptes).
I don't think anything needs to change in amdgpu_vm_update_ptes, because
all pages contiguous passed to it use the same page type, so the same
flags. And the existing code in amdgpu_vm_update_mapping already does
the right thing. I honestly don't see the problem.
But then you could also just call amdgpu_vm_update_mapping() multiple
times feeding it from whatever data structure you use in the HMM code.

Using the page array sounds like an intermediate data structure to me
you only created to feed into amdgpu_vm_update_mapping().
You're right. It could be done. One new call to amdgpu_vm_update_mapping
every time the flags change as we iterate over virtual addresses. It
should work OK as long as the mappings of different memory types aren't
too finely interleaved.

I suggest that we stick to this approach for now.

The only alternative I see is that we have an unified address space for 
DMA addresses similar to what you have outlined.

This way we can easily figure out for each address to which device it 
belongs and then calculate the best path to that device.

E.g. direct address for system memory, clearing the S bit for local 
memory, using XGMI for remote memory etc...

That is similar to what you have in mind, just a bit more generalized.

Regards,
Christian.

Regards,
   Felix

Regards,
Christian.

Regards,
    Felix

Regards,
Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx