Dave, Daniel. Another one (the last for some time) of those pull requests that Linus probably wants separate. The mm and ttm patches have been acked by the maintainers for merge through drm, but see CAVEATS below: CAVEATS: - Patch 1/9 is trivial, but I can't get it acked or reviewed by fs people despite repeated efforts. - There are two trivial conflicts (patch context) with linux-next. This pull request is not urgent for 5.7. Although that would be desirable, it may well wait for 5.8. ---------------------------------------------------------------------------- Huge page-table entries for TTM In order to reduce CPU usage [1] and in theory TLB misses this patchset enables huge- and giant page-table entries for TTM and TTM-enabled graphics drivers. Patch 1 and 2 introduce a vma_is_special_huge() function to make the mm code take the same path as DAX when splitting huge- and giant page table entries, (which currently means zapping the page-table entry and rely on re-faulting). Patch 3 makes the mm code split existing huge page-table entries on huge_fault fallbacks. Typically on COW or on buffer-objects that want write-notify. COW and write-notification is always done on the lowest page-table level. See the patch log message for additional considerations. Patch 4 introduces functions to allow the graphics drivers to manipulate the caching- and encryption flags of huge page-table entries without ugly hacks. Patch 5 implements the huge_fault handler in TTM. This enables huge page-table entries, provided that the kernel is configured to support transhuge pages, either by default or using madvise(). However, they are unlikely to be inserted unless the kernel buffer object pfns and user-space addresses align perfectly. There are various options here, but since buffer objects that reside in system pages typically start at huge page boundaries if they are backed by huge pages, we try to enforce buffer object starting pfns and user-space addresses to be huge page-size aligned if their size exceeds a huge page-size. If pud-size transhuge ("giant") pages are enabled by the arch, the same holds for those. Patch 6 implements a specialized huge_fault handler for vmwgfx. The vmwgfx driver may perform dirty-tracking and needs some special code to handle that correctly. Patch 7 implements a drm helper to align user-space addresses according to the above scheme, if possible. Patch 8 implements a TTM range manager for vmwgfx that does the same for graphics IO memory. This may later be reused by other graphics drivers if necessary. Patch 9 finally hooks up the helpers of patch 7 and 8 to the vmwgfx driver. A similar change is needed for graphics drivers that want a reasonable likelyhood of actually using huge page-table entries. If a buffer object size is not huge-page or giant-page aligned, its size will NOT be inflated by this patchset. This means that the buffer object tail will use smaller size page-table entries and thus no memory overhead occurs. Drivers that want to pay the memory overhead price need to implement their own scheme to inflate buffer-object sizes. PMD size huge page-table-entries have been tested with vmwgfx and found to work well both with system memory backed and IO memory backed buffer objects. PUD size giant page-table-entries have seen limited (fault and COW) testing using a modified kernel (to support 1GB page allocations) and a fake vmwgfx TTM memory type. The vmwgfx driver does otherwise not support 1GB-size IO memory resources. [1] The below test program generates the following gnu time output when run on a vmwgfx-enabled kernel without the patch series: 4.78user 6.02system 0:10.91elapsed 99%CPU (0avgtext+0avgdata 1624maxresident)k 0inputs+0outputs (0major+640077minor)pagefaults 0swaps and with the patch series: 1.71user 3.60system 0:05.40elapsed 98%CPU (0avgtext+0avgdata 1656maxresident)k 0inputs+0outputs (0major+20079minor)pagefaults 0swaps A consistent number of reduced graphics page-faults can be seen with normal graphics applications, but probably due to the aggressive buffer object caching in vmwgfx user-space drivers the CPU time reduction is within error limits. #include <unistd.h> #include <string.h> #include <sys/mman.h> #include <xf86drm.h> static void checkerr(int ret, const char *name) { if (ret < 0) { perror(name); exit(-1); } } int main(int agc, const char *argv[]) { struct drm_mode_create_dumb c_arg = {0}; struct drm_mode_map_dumb m_arg = {0}; struct drm_mode_destroy_dumb d_arg = {0}; int ret, i, fd; void *map; fd = open("/dev/dri/card0", O_RDWR); checkerr(fd, argv[0]); for (i = 0; i < 10000; ++i) { c_arg.bpp = 32; c_arg.width = 1024; c_arg.height = 1024; ret = drmIoctl(fd, DRM_IOCTL_MODE_CREATE_DUMB, &c_arg); checkerr(fd, argv[0]); m_arg.handle = c_arg.handle; ret = drmIoctl(fd, DRM_IOCTL_MODE_MAP_DUMB, &m_arg); checkerr(fd, argv[0]); map = mmap(NULL, c_arg.size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, m_arg.offset); checkerr(map == MAP_FAILED ? -1 : 0, argv[0]); (void) madvise((void *) map, c_arg.size, MADV_HUGEPAGE); memset(map, 0x67, c_arg.size); munmap(map, c_arg.size); d_arg.handle = c_arg.handle; ret = drmIoctl(fd, DRM_IOCTL_MODE_DESTROY_DUMB, &d_arg); checkerr(ret, argv[0]); } close(fd); } ---------------------------------------------------------------------------- The following changes since commit cb7adfd6ad12a11902ebe374bec7fd4efa2cec1c: Merge tag 'mediatek-drm-next-5.7' of https://github.com/ckhu-mediatek/linux.git-tags into drm-next (2020-03-20 13:08:38 +1000) are available in the Git repository at: git://people.freedesktop.org/~thomash/linux ttm-transhuge for you to fetch changes up to 9431042dbc8ce490d49c7f9d5142805b6249208b: drm/vmwgfx: Hook up the helpers to align buffer objects (2020-03-24 18:50:35 +0100) ---------------------------------------------------------------- Thomas Hellstrom (VMware) (9): fs: Constify vma argument to vma_is_dax mm: Introduce vma_is_special_huge mm: Split huge pages on write-notify or COW mm: Add vmf_insert_pfn_xxx_prot() for huge page-table entries drm/ttm, drm/vmwgfx: Support huge TTM pagefaults drm/vmwgfx: Support huge page faults drm: Add a drm_get_unmapped_area() helper drm/vmwgfx: Introduce a huge page aligning TTM range manager drm/vmwgfx: Hook up the helpers to align buffer objects drivers/gpu/drm/drm_file.c | 141 ++++++++++++++++++++++++ drivers/gpu/drm/ttm/ttm_bo_vm.c | 161 +++++++++++++++++++++++++++- drivers/gpu/drm/vmwgfx/Makefile | 1 + drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 13 +++ drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 12 +++ drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c | 76 ++++++++++++- drivers/gpu/drm/vmwgfx/vmwgfx_thp.c | 166 +++++++++++++++++++++++++++++ drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c | 5 +- include/drm/drm_file.h | 9 ++ include/drm/ttm/ttm_bo_api.h | 3 +- include/linux/fs.h | 2 +- include/linux/huge_mm.h | 41 ++++++- include/linux/mm.h | 17 +++ mm/huge_memory.c | 44 ++++++-- mm/memory.c | 27 +++-- 16 files changed, 692 insertions(+), 28 deletions(-) create mode 100644 drivers/gpu/drm/vmwgfx/vmwgfx_thp.c _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel