Re: [PATCH] drm/amd/amdgpu: solve the issue of allocate continuous pages under xen dom0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Well this patch is a pretty clear NAK.

Using xen_swiotlb with userptr in the first place sounds illegal to me since we need coherent buffers which are incompatible with swiotlb.

In other words using userptrs in our driver stack are fundamentally incompatible with swiotlb, so it doesn't make sense to try o get this working using those workarounds.

So the real question is why are you trying to use userptr with XEN dom0? That won't work correctly at all.

Regards,
Christian.

Am 22.09.22 um 11:00 schrieb Xiao, Shane:
[AMD Official Use Only - General]

Hi Christian,

Why should the resulting sg table be too large?
[Shane]
sg_alloc_table_from_pages will set max_segment field as default value UINT_MAX  to sg_alloc_table_from_pages_segment. The filed max_segment works as follows:
“Contiguous ranges of the pages are squashed into a single scatterlist node up to the maximum size specified in @max_segment.”
If we don't set the max_segment field, the sg_alloc_table_from_pages may allocate 2M or more continuous ranges of pages.


For what too large?
[Shane]
However, these pages are called pseudo-physical pages on xen dom0, which means that the actual machine pages are not necessarily continuous.
When this happens, the xen_swiotlb will use bounce buffer to do dma operation by swiotlb_tbl_map_single.
But, the xen_swiotlb only allows IO_TLB_SEGSIZE*IO_TLB_SHIFT (256K) continuous pages, and the allocate 2M or more continuous ranges of pages will cause such error "swiotlb buffer is full".

BTW: intel uses the same method to allocate page tables in i915_gem_userptr_get_pages.

Best Regards,
Shane

-----Original Message-----
From: Koenig, Christian <Christian.Koenig@xxxxxxx>
Sent: Thursday, September 22, 2022 3:19 PM
To: Xiao, Shane <shane.xiao@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Subject: Re: [PATCH] drm/amd/amdgpu: solve the issue of allocate
continuous pages under xen dom0

Am 22.09.22 um 09:11 schrieb Shane Xiao:
[Why]
sg_alloc_table_from_pages alloc too large continuous PFN pages under
xen dom0.

Well that sentence doesn't make much sense. Why should the resulting sg
table be to large? For what to large?

Regards,
Christian.

However, xen should check continuous MFN pages in
range_straddles_page_boundary.
When range_straddles_page_boundary return false, some cases fall back
into swiotlb process and the continuous allocable page is not enough.

[How]
In fact, xen swiotlb set max_segment default value as UINT_MAX and
xen_swiotlb_init_early already change the value to PAGE_SIZE under xen
dom0.
However amdgpu driver doesn't use the value, which may cause issue
such as swiotlb buffer full. Add amd_sg_segment_size according to
iommu setting, the details are as follows:
	iommu setting		|	amd_sg_segment_size
-------------------------------------------------------------------------------
	iommu=on		|	UINT_MAX
      iommu=off && swiotlb on	|	IO_TLB_DEFAULT_SIZE(64M)
	xen_swiotlb on		|	PAGE_SIZE(4K)
----------------------------------------------------------------------
---------

Signed-off-by: Shane Xiao <shane.xiao@xxxxxxx>
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 22
++++++++++++++++++++--
   1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 134575a3893c..d081fcd22d6b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -80,6 +80,23 @@ static int amdgpu_ttm_init_on_chip(struct
amdgpu_device *adev,
   				  false, size_in_page);
   }

+static inline unsigned int amdgpu_sg_segment_size(void) {
+	unsigned int size = swiotlb_max_segment();
+
+	/* size=0 when amd iommu enabled */
+	if (size == 0)
+		size = UINT_MAX;
+
+	size = rounddown(size, PAGE_SIZE);
+	/* swiotlb_max_segment_size can return 1 byte when it means one
page. */
+	if (size < PAGE_SIZE)
+		size = PAGE_SIZE;
+
+	return size;
+}
+
+
   /**
    * amdgpu_evict_flags - Compute placement flags
    *
@@ -760,9 +777,10 @@ static int amdgpu_ttm_tt_pin_userptr(struct
ttm_device *bdev,
   	int r;

   	/* Allocate an SG array and squash pages into it */
-	r = sg_alloc_table_from_pages(ttm->sg, ttm->pages, ttm-
num_pages, 0,
-				      (u64)ttm->num_pages << PAGE_SHIFT,
+	r = sg_alloc_table_from_pages_segment(ttm->sg, ttm->pages, ttm-
num_pages, 0,
+				      (u64)ttm->num_pages << PAGE_SHIFT,
+amdgpu_sg_segment_size(),
   				      GFP_KERNEL);
+
   	if (r)
   		goto release_sg;





[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux