On 01/08/17 10:10 AM, Christian König wrote: > You need to cover multiple code path here: > 1. The one you currently implemented which uses ttm_dma_populate() and > pci_map_page(). > 2. The one using ttm_dma_populate(). I'll have to look into this one though it's my understanding that code path is only followed if there's no (hardware) IOMMU right? Which while it'd have to be covered isn't a high priority right now (our devel platforms have hardware IOMMU). None the less I'll look into it once I figure out #3 and #4 per your comments. > 3. The one using drm_prime_sg_to_page_addr_arrays() a bit above for > imported BOs. This one seems rather straight forward but the only catch is I don't have access to "adev" inside that drm function. Would it be taboo to do something like diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 376078334f54..cd97ef2144c9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -916,6 +916,12 @@ static int amdgpu_ttm_tt_populate(struct ttm_tt *ttm) drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages, gtt->ttm.dma_address, ttm->num_pages); ttm->state = tt_unbound; + for (i = 0; i < ttm->num_pages; i++) { + trace_amdgpu_ttm_tt_populate( + adev, + gtt->ttm.dma_address[i], + page_to_phys(ttm->pages[i])); + } return 0; } This would normally result in a for loop around a nop which shouldn't be a huge performance hit but is one none the less. > 4. The in amdgpu_ttm_tt_pin_userptr() which uses dma_map_sg() for userptrs. > > Basically all just take gtt->ttm.ttm.pages and fill gtt->ttm.dma_address. Same comment as #3. I can tackle this with a for loop around the trace if that is ok. > I suggest to add a helper you can call to trace all pages->dma_address > mappings inside a ttm. One thing at a time :-). That would probably be a bit better since the trace log gets filled with remappings (which is why my proof-of-concept umr patch only grabs the latest mapping as it reads the trace). Is there a handy place we can grab a list of ttm_tt's bound to our device? If so then in theory I could write a debugfs interface instead. Thanks for your patience as I'm definitely less familiar with the VM side of things :-) Cheers, Tom