On Fri, Feb 20, 2015 at 12:35 PM, Jan Vesely <jan.vesely@xxxxxxxxxxx> wrote: > Hello radeon devs, > > I have been trying to find out more about VM implementation on SI+ hw, > but unfortunately I could not find much in the public documents[0]. > > SI ISA manual suggests that there is a limited form of privileged mode > on these chips, so I wondered if it could be used for VM management too > (the docs only deal with numerical exceptions). Or does it always have > to be handled by host (driver)? These are related to trap/exception privilege for debugging for example. I'm not that familiar with how that stuff works. It's unrelated to GPUVM. > > One of the older patches [1] mentions different page sizes, is there any > public documentation on things like page table format, and GPU MMU > hierarchy? I could only get limited picture going through the code and > comments. There is not any public documentation on the VM hardware other than what is available in the driver. I can try and give you an overview of how it works. There are 16 VM contexts (8 on cayman/TN/RL) on the GPU that can be active at any given time. GPUVM supports a 40 bit address space. Each context has an id, we call them vmids. vmid 0 is a bit special. It's called the system context and behaves a bit differently to the other ones. It's designed to be for the kernel driver's view of GPU accessible memory. I can go into further detail if you want, but I don't think it's critical for this discussion. Just think of it as the context used by the kernel driver. So that leaves 16 contexts (7 on cayman and TN/RL) available for use by user clients. vmid 0 has one set of configuration registers and vmids 1-15 share the same configuration (other than the page tables). E.g., contexts 1-15 all have to use single or 2 level page tables for example. You select which VM context is used for a particular command buffer by a field in the command buffer packet. Some engines (e.g., UVD or the display hardware) do not support VM so they always use vmid 0. Right now only the graphics, compute, and DMA engines support VM. With single level page tables, you just have a big array of page table entries (PTEs) that represent the entire virtual address space. With multi-level page tables, the address space is represented by an array of page directory entries (PDEs) that point to page table blocks (PTBs) which are arrays of PTEs. PTEs and PDEs are 64 bits per entry. PDE: 39:12 - PTB address 0 - PDE valid (the entry is valid) PTE: 39:12 - page address 11:7 - fragment 6 - write 5 - read 2 - CPU cache snoop (for accessing cached system memory) 1 - system (page is in system memory rather than vram) 0 - PTE valid (the entry is valid) Fragment needs some explanation. The logical/physical fragment size in bytes = 2 ^ (12 + fragment). A fragment size of 0 means 4k, 1 means, 8k, etc. The logical address must be aligned to the fragment size and the memory backing it must be contiguous and at least as large as the fragment size. Larger fragment sizes reduce the pressure on the TLB since fewer entries are required for the same amount of memory. For system pages, the page address is the dma address of the page. The system bit must be set and the snoop bit can be optionally set depending on whether you are using cachable memory. For vram pages, the address is the GPU physical address of vram (starts at 0 on dGPUs, starts at MC_VM_FB_OFFSET (dma address of "vram" carve out) on APUs). You can also adjust the page table block size which controls the number of pages per PTB. I.e., how many PDEs you need to cover the address space. E.g., if you set the block size to 0, each PTB is 4k which holds 512 PTEs; if you set it to 1 each PTB is 8k which holds 1024 PTEs, etc. GPUVM is only concerned with memory management and protection. There are other protection features in other hw blocks for things beyond memory. For example, on CI and newer asics, the CP and SDMA blocks execute command buffers in a secure mode that limits them to accessing only registers that are relevant for those blocks (e.g., gfx or compute state registers, but not display registers) or only executing certain packets. I hope this helps. Let me know if you have any more questions. Alex > > > thank you, > Jan > > [0]http://developer.amd.com/resources/documentation-articles/developer-guides-manuals/ > [1]http://lists.freedesktop.org/archives/dri-devel/2014-May/058858.html > > > -- > Jan Vesely <jan.vesely@xxxxxxxxxxx> _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel