Re: [RFC] drm/ttm: dma: Fixes for 32-bit and 64-bit ARM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/08/2014 04:14 PM, Alexandre Courbot wrote:
On 11/12/2014 09:39 PM, Thierry Reding wrote:
From: Thierry Reding <treding@xxxxxxxxxx>

dma_alloc_coherent() returns a kernel virtual address that is part of
the linear range. Passing such an address to virt_to_page() is illegal
on non-coherent architectures. This causes the kernel to oops on 64-bit
ARM because the struct page * obtained from virt_to_page() points to
unmapped memory.

This commit fixes this by using phys_to_page() since we get a physical
address from dma_alloc_coherent(). Note that this is not a proper fix
because if an IOMMU is set up to translate addresses for the GPU this
address will be an I/O virtual address rather than a physical one. The
proper fix probably involves not getting a pointer to the struct page
in the first place, but that would be a much more intrusive change, if
at all possible.

Until that time, this temporary fix will allow TTM to work on 32-bit
and 64-bit ARM as well, provided that no IOMMU translations are enabled
for the GPU.

Signed-off-by: Thierry Reding <treding@xxxxxxxxxx>
---
Arnd, I realize that this isn't a proper fix according to what we
discussed on
IRC yesterday, but I can't see a way to remove access to the pages
array that
would be as simple as this. I've marked this as RFC in the hope that
it will
trigger some discussion that will lead to a proper solution.

  drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 4 ++++
  1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
index c96db433f8af..d7993985752c 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
@@ -343,7 +343,11 @@ static struct dma_page
*__ttm_dma_alloc_page(struct dma_pool *pool)
                         &d_page->dma,
                         pool->gfp_flags);
      if (d_page->vaddr)
+#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
+        d_page->p = phys_to_page(d_page->dma);
+#else
          d_page->p = virt_to_page(d_page->vaddr);
+#endif

Since I am messing with the IOMMU I just happened to have hit the issue
you are mentioning. Wouldn't the following work:

-       if (d_page->vaddr)
-#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
-               d_page->p = phys_to_page(d_page->dma);
-#else
-               d_page->p = virt_to_page(d_page->vaddr);
-#endif
-       else {
+       if (d_page->vaddr) {
+               if (is_vmalloc_addr(d_page->vaddr)) {
+                       d_page->p = vmalloc_to_page(d_page->vaddr);
+               } else {
+                       d_page->p = virt_to_page(d_page->vaddr);
+               }
+       } else {

A remapped page will end up in the vmalloc range of the address space,
and in this case we can use vmalloc_to_page() to get the right page.
Pages outside of this range are part of the linear mapping and can be
resolved using virt_to_page().

Jetson seems to be mostly happy with this, although I sometimes get the
following trace:

[   13.174763] kernel BUG at ../mm/slab.c:2593!
[   13.174767] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
[   13.174790] Modules linked in: nouveau_platform(O+) nouveau(O)
cfbfillrect cfbimgblt cfbcopyarea ttm
...
[   13.175234] [<c00de238>] (cache_alloc_refill) from [<c00de528>]
(__kmalloc+0x100/0x13c)
[   13.175247] [<c00de528>] (__kmalloc) from [<c001d564>]
(arm_iommu_alloc_attrs+0x94/0x3a8)
[   13.175269] [<c001d564>] (arm_iommu_alloc_attrs) from [<bf008f4c>]
(ttm_dma_populate+0x498/0x76c [ttm])
[   13.175294] [<bf008f4c>] (ttm_dma_populate [ttm]) from [<bf000bb8>]
(ttm_tt_bind+0x38/0x68 [ttm])
[   13.175315] [<bf000bb8>] (ttm_tt_bind [ttm]) from [<bf00298c>]
(ttm_bo_handle_move_mem+0x408/0x47c [ttm])
[   13.175337] [<bf00298c>] (ttm_bo_handle_move_mem [ttm]) from
[<bf003758>] (ttm_bo_validate+0x220/0x22c [ttm])
[   13.175359] [<bf003758>] (ttm_bo_validate [ttm]) from [<bf003984>]
(ttm_bo_init+0x220/0x338 [ttm])
[   13.175480] [<bf003984>] (ttm_bo_init [ttm]) from [<bf0c70a0>]
(nouveau_bo_new+0x1c0/0x294 [nouveau])
[   13.175688] [<bf0c70a0>] (nouveau_bo_new [nouveau]) from [<bf0ce88c>]
(nv84_fence_create+0x1cc/0x240 [nouveau])
[   13.175891] [<bf0ce88c>] (nv84_fence_create [nouveau]) from
[<bf0cec90>] (nvc0_fence_create+0xc/0x24 [nouveau])
[   13.176094] [<bf0cec90>] (nvc0_fence_create [nouveau]) from
[<bf0c1480>] (nouveau_accel_init+0xec/0x450 [nouveau])

I suspect this is related to this change, but it might also be the
side-effect of another bug in my code.

FWIW and after some more testing, I noticed that without IOMMU vmalloc_to_page() and phys_to_page() both return the same valid page. With the IOMMU enabled vmalloc_to_page() still returns what seems to be valid pages (phys_to_page() of course doesn't make sense anymore).

So AFAICT the change I proposed seems valid. I am not sure what causes the BUG() in the slab allocator.
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel





[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux