Re: VM on GPUs

Alex Deucher <alexdeucher@xxxxxxxxx> · Sat, 21 Feb 2015 10:24:37 -0500

On Fri, Feb 20, 2015 at 7:21 PM, Jan Vesely <jan.vesely@xxxxxxxxxxx> wrote:
> Hi,
>
> thank you for exhaustive answer. I have few more
> questions/clarifications:
> is the DMA address used to access system pages further translated using
> IOMMU (if present), or are GPUs treated specially?
>

Yes, the address may be further translated via an IOMMU. That's why I
said dma address rather than bus address or physical address.

> I have only seen references to TLB flush, so I guess invalidating
> individual entries is not supported?

You can flush individual vmids, but not individual entries.

> does it mean that if a page needs to be moved/migrated a complete VMID
> tlb flush is required?

Yes.

>
> I was a bit surprised to find about PCIe cache snoop, since the work I
> have seen before assumes DMA is not cache coherent. I guess there's a
> latency penalty for for using it, do you have any idea how much worse it
> gets (relatively to non-snoop access)?

It is slower than non-snooped.  I don't remember the numbers off hand.
I think some of the compute documents or HSA developer summit
presentations on the AMD developer site go into the details.

Alex

>
> thanks again,
> jan
>
> On Fri, 2015-02-20 at 17:19 -0500, Alex Deucher wrote:
>> On Fri, Feb 20, 2015 at 12:35 PM, Jan Vesely <jan.vesely@xxxxxxxxxxx> wrote:
>> > Hello radeon devs,
>> >
>> > I have been trying to find out more about VM implementation on SI+ hw,
>> > but unfortunately I could not find much in the public documents[0].
>> >
>> > SI ISA manual suggests that there is a limited form of privileged mode
>> > on these chips, so I wondered if it could be used for VM management too
>> > (the docs only deal with numerical exceptions). Or does it always have
>> > to be handled by host (driver)?
>>
>> These are related to trap/exception privilege for debugging for
>> example.  I'm not that familiar with how that stuff works.  It's
>> unrelated to GPUVM.
>>
>> >
>> > One of the older patches [1] mentions different page sizes, is there any
>> > public documentation on things like page table format, and GPU MMU
>> > hierarchy? I could only get limited picture going through the code and
>> > comments.
>>
>> There is not any public documentation on the VM hardware other than
>> what is available in the driver.  I can try and give you an overview
>> of how it works.  There are 16 VM contexts (8 on cayman/TN/RL) on the
>> GPU that can be active at any given time.  GPUVM supports a 40 bit
>> address space.  Each context has an id, we call them vmids.  vmid 0 is
>> a bit special.  It's called the system context and behaves a bit
>> differently to the other ones.  It's designed to be for the kernel
>> driver's view of GPU accessible memory.  I can go into further detail
>> if you want, but I don't think it's critical for this discussion.
>> Just think of it as the context used by the kernel driver.  So that
>> leaves 16 contexts (7 on cayman and TN/RL) available for use by user
>> clients.  vmid 0 has one set of configuration registers and vmids 1-15
>> share the same configuration (other than the page tables).  E.g.,
>> contexts 1-15 all have to use single or 2 level page tables for
>> example.  You select which VM context is used for a particular command
>> buffer by a field in the command buffer packet.  Some engines (e.g.,
>> UVD or the display hardware) do not support VM so they always use vmid
>> 0.  Right now only the graphics, compute, and DMA engines support VM.
>>
>> With single level page tables, you just have a big array of page table
>> entries (PTEs) that represent the entire virtual address space.  With
>> multi-level page tables, the address space is represented by an array
>> of page directory entries (PDEs) that point to page table blocks
>> (PTBs) which are arrays of PTEs.
>>
>> PTEs and PDEs are 64 bits per entry.
>>
>> PDE:
>> 39:12 - PTB address
>> 0 - PDE valid (the entry is valid)
>>
>> PTE:
>> 39:12 - page address
>> 11:7 - fragment
>> 6 - write
>> 5 - read
>> 2 - CPU cache snoop (for accessing cached system memory)
>> 1 - system (page is in system memory rather than vram)
>> 0 - PTE valid (the entry is valid)
>>
>> Fragment needs some explanation. The logical/physical fragment size in
>> bytes = 2 ^ (12 + fragment).  A fragment size of 0 means 4k, 1 means,
>> 8k, etc.  The logical address must be aligned to the fragment size and
>> the memory backing it must be contiguous and at least as large as the
>> fragment size.  Larger fragment sizes reduce the pressure on the TLB
>> since fewer entries are required for the same amount of memory.
>>
>> For system pages, the page address is the dma address of the page.
>> The system bit must be set and the snoop bit can be optionally set
>> depending on whether you are using cachable memory.
>>
>> For vram pages, the address is the GPU physical address of vram
>> (starts at 0 on dGPUs, starts at MC_VM_FB_OFFSET (dma address of
>> "vram" carve out) on APUs).
>>
>> You can also adjust the page table block size which controls the
>> number of pages per PTB.  I.e., how many PDEs you need to cover the
>> address space.  E.g., if you set the block size to 0, each PTB is 4k
>> which holds 512 PTEs; if you set it to 1 each PTB is 8k which holds
>> 1024 PTEs, etc.
>>
>> GPUVM is only concerned with memory management and protection.  There
>> are other protection features in other hw blocks for things beyond
>> memory.  For example, on CI and newer asics, the CP and SDMA blocks
>> execute command buffers in a secure mode that limits them to accessing
>> only registers that are relevant for those blocks (e.g., gfx or
>> compute state registers, but not display registers) or only executing
>> certain packets.
>>
>> I hope this helps.  Let me know if you have any more questions.
>>
>> Alex
>>
>> >
>> >
>> > thank you,
>> > Jan
>> >
>> > [0]http://developer.amd.com/resources/documentation-articles/developer-guides-manuals/
>> > [1]http://lists.freedesktop.org/archives/dri-devel/2014-May/058858.html
>> >
>> >
>> > --
>> > Jan Vesely <jan.vesely@xxxxxxxxxxx>
>
> --
> Jan Vesely <jan.vesely@xxxxxxxxxxx>
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel