On 1/29/20 3:55 PM, Christian König wrote:
Am 24.01.20 um 10:09 schrieb Thomas Hellström (VMware):
From: Thomas Hellstrom <thellstrom@xxxxxxxxxx>
Support huge (PMD-size and PUD-size) page-table entries by providing a
huge_fault() callback.
We still support private mappings and write-notify by splitting the huge
page-table entries on write-access.
Note that for huge page-faults to occur, either the kernel needs to be
compiled with trans-huge-pages always enabled, or the kernel needs to be
compiled with trans-huge-pages enabled using madvise, and the user-space
app needs to call madvise() to enable trans-huge pages on a per-mapping
basis.
Furthermore huge page-faults will not succeed unless buffer objects and
user-space addresses are aligned on huge page size boundaries.
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxx>
Cc: "Matthew Wilcox (Oracle)" <willy@xxxxxxxxxxxxx>
Cc: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx>
Cc: Ralph Campbell <rcampbell@xxxxxxxxxx>
Cc: "Jérôme Glisse" <jglisse@xxxxxxxxxx>
Cc: "Christian König" <christian.koenig@xxxxxxx>
Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
Signed-off-by: Thomas Hellstrom <thellstrom@xxxxxxxxxx>
Reviewed-by: Roland Scheidegger <sroland@xxxxxxxxxx>
---
drivers/gpu/drm/ttm/ttm_bo_vm.c | 145 ++++++++++++++++++++-
drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c | 2 +-
include/drm/ttm/ttm_bo_api.h | 3 +-
3 files changed, 145 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c
b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 389128b8c4dd..49704261a00d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -156,6 +156,89 @@ vm_fault_t ttm_bo_vm_reserve(struct
ttm_buffer_object *bo,
}
EXPORT_SYMBOL(ttm_bo_vm_reserve);
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+/**
+ * ttm_bo_vm_insert_huge - Insert a pfn for PUD or PMD faults
+ * @vmf: Fault data
+ * @bo: The buffer object
+ * @page_offset: Page offset from bo start
+ * @fault_page_size: The size of the fault in pages.
+ * @pgprot: The page protections.
+ * Does additional checking whether it's possible to insert a PUD or
PMD
+ * pfn and performs the insertion.
+ *
+ * Return: VM_FAULT_NOPAGE on successful insertion,
VM_FAULT_FALLBACK if
+ * a huge fault was not possible, and a VM_FAULT_ERROR code otherwise.
+ */
+static vm_fault_t ttm_bo_vm_insert_huge(struct vm_fault *vmf,
+ struct ttm_buffer_object *bo,
+ pgoff_t page_offset,
+ pgoff_t fault_page_size,
+ pgprot_t pgprot)
+{
+ pgoff_t i;
+ vm_fault_t ret;
+ unsigned long pfn;
+ pfn_t pfnt;
+ struct ttm_tt *ttm = bo->ttm;
+ bool write = vmf->flags & FAULT_FLAG_WRITE;
+
+ /* Fault should not cross bo boundary. */
+ page_offset &= ~(fault_page_size - 1);
+ if (page_offset + fault_page_size > bo->num_pages)
+ goto out_fallback;
+
+ if (bo->mem.bus.is_iomem)
+ pfn = ttm_bo_io_mem_pfn(bo, page_offset);
+ else
+ pfn = page_to_pfn(ttm->pages[page_offset]);
+
+ /* pfn must be fault_page_size aligned. */
+ if ((pfn & (fault_page_size - 1)) != 0)
+ goto out_fallback;
+
+ /* Check that memory is contiguous. */
+ if (!bo->mem.bus.is_iomem)
+ for (i = 1; i < fault_page_size; ++i) {
+ if (page_to_pfn(ttm->pages[page_offset + i]) != pfn + i)
+ goto out_fallback;
+ }
+ /* IO mem without the io_mem_pfn callback is always contiguous. */
+ else if (bo->bdev->driver->io_mem_pfn)
+ for (i = 1; i < fault_page_size; ++i) {
+ if (ttm_bo_io_mem_pfn(bo, page_offset + i) != pfn + i)
+ goto out_fallback;
+ }
Maybe add {} to the if to make clear where things start/end.
+
+ pfnt = __pfn_to_pfn_t(pfn, PFN_DEV);
+ if (fault_page_size == (HPAGE_PMD_SIZE >> PAGE_SHIFT))
+ ret = vmf_insert_pfn_pmd_prot(vmf, pfnt, pgprot, write);
+#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
+ else if (fault_page_size == (HPAGE_PUD_SIZE >> PAGE_SHIFT))
+ ret = vmf_insert_pfn_pud_prot(vmf, pfnt, pgprot, write);
+#endif
+ else
+ WARN_ON_ONCE(ret = VM_FAULT_FALLBACK);
+
+ if (ret != VM_FAULT_NOPAGE)
+ goto out_fallback;
+
+ return VM_FAULT_NOPAGE;
+out_fallback:
+ count_vm_event(THP_FAULT_FALLBACK);
+ return VM_FAULT_FALLBACK;
This doesn't seem to match the function documentation since we never
return ret here as far as I can see.
Apart from those comments it looks like that should work,
Christian.
Thanks for reviewing, Christian. I'll update the next version with your
feedback.
/Thomas