Hi, thank you. I found a presentation on AMD APUs that mentions throughtput differences between different types of memory. This information helps me a lot. thanks again, Jan On Sat, 2015-02-21 at 10:24 -0500, Alex Deucher wrote: > On Fri, Feb 20, 2015 at 7:21 PM, Jan Vesely <jan.vesely@xxxxxxxxxxx> wrote: > > Hi, > > > > thank you for exhaustive answer. I have few more > > questions/clarifications: > > is the DMA address used to access system pages further translated using > > IOMMU (if present), or are GPUs treated specially? > > > > Yes, the address may be further translated via an IOMMU. That's why I > said dma address rather than bus address or physical address. > > > I have only seen references to TLB flush, so I guess invalidating > > individual entries is not supported? > > You can flush individual vmids, but not individual entries. > > > does it mean that if a page needs to be moved/migrated a complete VMID > > tlb flush is required? > > Yes. > > > > > I was a bit surprised to find about PCIe cache snoop, since the work I > > have seen before assumes DMA is not cache coherent. I guess there's a > > latency penalty for for using it, do you have any idea how much worse it > > gets (relatively to non-snoop access)? > > It is slower than non-snooped. I don't remember the numbers off hand. > I think some of the compute documents or HSA developer summit > presentations on the AMD developer site go into the details. > > Alex > > > > > thanks again, > > jan > > > > On Fri, 2015-02-20 at 17:19 -0500, Alex Deucher wrote: > >> On Fri, Feb 20, 2015 at 12:35 PM, Jan Vesely <jan.vesely@xxxxxxxxxxx> wrote: > >> > Hello radeon devs, > >> > > >> > I have been trying to find out more about VM implementation on SI+ hw, > >> > but unfortunately I could not find much in the public documents[0]. > >> > > >> > SI ISA manual suggests that there is a limited form of privileged mode > >> > on these chips, so I wondered if it could be used for VM management too > >> > (the docs only deal with numerical exceptions). Or does it always have > >> > to be handled by host (driver)? > >> > >> These are related to trap/exception privilege for debugging for > >> example. I'm not that familiar with how that stuff works. It's > >> unrelated to GPUVM. > >> > >> > > >> > One of the older patches [1] mentions different page sizes, is there any > >> > public documentation on things like page table format, and GPU MMU > >> > hierarchy? I could only get limited picture going through the code and > >> > comments. > >> > >> There is not any public documentation on the VM hardware other than > >> what is available in the driver. I can try and give you an overview > >> of how it works. There are 16 VM contexts (8 on cayman/TN/RL) on the > >> GPU that can be active at any given time. GPUVM supports a 40 bit > >> address space. Each context has an id, we call them vmids. vmid 0 is > >> a bit special. It's called the system context and behaves a bit > >> differently to the other ones. It's designed to be for the kernel > >> driver's view of GPU accessible memory. I can go into further detail > >> if you want, but I don't think it's critical for this discussion. > >> Just think of it as the context used by the kernel driver. So that > >> leaves 16 contexts (7 on cayman and TN/RL) available for use by user > >> clients. vmid 0 has one set of configuration registers and vmids 1-15 > >> share the same configuration (other than the page tables). E.g., > >> contexts 1-15 all have to use single or 2 level page tables for > >> example. You select which VM context is used for a particular command > >> buffer by a field in the command buffer packet. Some engines (e.g., > >> UVD or the display hardware) do not support VM so they always use vmid > >> 0. Right now only the graphics, compute, and DMA engines support VM. > >> > >> With single level page tables, you just have a big array of page table > >> entries (PTEs) that represent the entire virtual address space. With > >> multi-level page tables, the address space is represented by an array > >> of page directory entries (PDEs) that point to page table blocks > >> (PTBs) which are arrays of PTEs. > >> > >> PTEs and PDEs are 64 bits per entry. > >> > >> PDE: > >> 39:12 - PTB address > >> 0 - PDE valid (the entry is valid) > >> > >> PTE: > >> 39:12 - page address > >> 11:7 - fragment > >> 6 - write > >> 5 - read > >> 2 - CPU cache snoop (for accessing cached system memory) > >> 1 - system (page is in system memory rather than vram) > >> 0 - PTE valid (the entry is valid) > >> > >> Fragment needs some explanation. The logical/physical fragment size in > >> bytes = 2 ^ (12 + fragment). A fragment size of 0 means 4k, 1 means, > >> 8k, etc. The logical address must be aligned to the fragment size and > >> the memory backing it must be contiguous and at least as large as the > >> fragment size. Larger fragment sizes reduce the pressure on the TLB > >> since fewer entries are required for the same amount of memory. > >> > >> For system pages, the page address is the dma address of the page. > >> The system bit must be set and the snoop bit can be optionally set > >> depending on whether you are using cachable memory. > >> > >> For vram pages, the address is the GPU physical address of vram > >> (starts at 0 on dGPUs, starts at MC_VM_FB_OFFSET (dma address of > >> "vram" carve out) on APUs). > >> > >> You can also adjust the page table block size which controls the > >> number of pages per PTB. I.e., how many PDEs you need to cover the > >> address space. E.g., if you set the block size to 0, each PTB is 4k > >> which holds 512 PTEs; if you set it to 1 each PTB is 8k which holds > >> 1024 PTEs, etc. > >> > >> GPUVM is only concerned with memory management and protection. There > >> are other protection features in other hw blocks for things beyond > >> memory. For example, on CI and newer asics, the CP and SDMA blocks > >> execute command buffers in a secure mode that limits them to accessing > >> only registers that are relevant for those blocks (e.g., gfx or > >> compute state registers, but not display registers) or only executing > >> certain packets. > >> > >> I hope this helps. Let me know if you have any more questions. > >> > >> Alex > >> > >> > > >> > > >> > thank you, > >> > Jan > >> > > >> > [0]http://developer.amd.com/resources/documentation-articles/developer-guides-manuals/ > >> > [1]http://lists.freedesktop.org/archives/dri-devel/2014-May/058858.html > >> > > >> > > >> > -- > >> > Jan Vesely <jan.vesely@xxxxxxxxxxx> > > > > -- > > Jan Vesely <jan.vesely@xxxxxxxxxxx> -- Jan Vesely <jan.vesely@xxxxxxxxxxx>
Attachment:
signature.asc
Description: This is a digitally signed message part
_______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel