Hi Steve, On Fri, 8 Dec 2023 14:28:05 +0000 Steven Price <steven.price@xxxxxxx> wrote: > > +/** > > + * alloc_pt() - Custom page table allocator > > + * @cookie: Cookie passed at page table allocation time. > > + * @size: Size of the page table. This size should be fixed, > > + * and determined at creation time based on the granule size. > > + * @gfp: GFP flags. > > + * > > + * We want a custom allocator so we can use a cache for page table > > + * allocations and amortize the cost of the over-reservation that's > > + * done to allow asynchronous VM operations. > > + * > > + * Return: non-NULL on success, NULL if the allocation failed for any > > + * reason. > > + */ > > +static void *alloc_pt(void *cookie, size_t size, gfp_t gfp) > > +{ > > + struct panthor_vm *vm = cookie; > > + void *page; > > + > > + /* Allocation of the root page table happening during init. */ > > + if (unlikely(!vm->pgtbl_ops)) { > > I'm not that keen on using pgtbl_ops as the proxy for this. Can we use > root_page_table instead? Definitely, I actually intended to test ->root_page_table when I introduced this field, but somehow forgot to update this part of the code. > > At the moment if the IOMMU code ever did multiple allocations during > alloc_io_pgtable_ops() then we'd overwrite root_page_table and screw up > on the free path. > > If we use root_page_table == NULL as the check then things will > 'cleanly' fail by falling through to the non-root case in that case. > > Of course this really looks like we should have had a different > allocator for the root table but I'm not (re)opening that can of worms! ;) > > And of course it doesn't make any sense for the IOMMU code to do > multiple allocations so this is all rather academic - but maybe one day > there will be a different page table structure (16K pages maybe?). > > > + struct page *p; > > + > > + drm_WARN_ON(&vm->ptdev->base, vm->op_ctx); > > + p = alloc_pages_node(dev_to_node(vm->ptdev->base.dev), > > + gfp | __GFP_ZERO, get_order(size)); > > + page = p ? page_address(p) : NULL; > > + vm->root_page_table = page; > > + return page; > > + } > > + > > + /* We're not supposed to have anything bigger than 4k here, because we picked a > > + * 4k granule size at init time. > > + */ > > + if (drm_WARN_ON(&vm->ptdev->base, size != SZ_4K)) > > + return NULL; > > + > > + /* We must have some op_ctx attached to the VM and it must have at least one > > + * free page. > > + */ > > + if (drm_WARN_ON(&vm->ptdev->base, !vm->op_ctx) || > > + drm_WARN_ON(&vm->ptdev->base, > > + vm->op_ctx->rsvd_page_tables.ptr >= vm->op_ctx->rsvd_page_tables.count)) > > + return NULL; > > + > > + page = vm->op_ctx->rsvd_page_tables.pages[vm->op_ctx->rsvd_page_tables.ptr++]; > > + memset(page, 0, SZ_4K); > > + > > + /* Page table entries don't use virtual addresses, which trips out > > + * kmemleak. kmemleak_alloc_phys() might work, but physical addresses > > + * are mixed with other fields, and I fear kmemleak won't detect that > > + * either. > > + * > > + * Let's just ignore memory passed to the page-table driver for now. > > + */ > > + kmemleak_ignore(page); > > + return page; > > +} > > + > > +/** > > + * @free_pt() - Custom page table free function > > + * @cookie: Cookie passed at page table allocation time. > > + * @data: Page table to free. > > + * @size: Size of the page table. This size should be fixed, > > + * and determined at creation time based on the granule size. > > + */ > > +static void free_pt(void *cookie, void *data, size_t size) > > +{ > > + struct panthor_vm *vm = cookie; > > + > > + if (unlikely(vm->root_page_table == data)) { > > + free_pages((unsigned long)data, get_order(size)); > > Maybe add "vm->root_page_table = NULL;"? Sure. > > > + return; > > + } [...] > > +/** > > + * panthor_vm_alloc_va() - Allocate a region in the auto-va space > > + * @VM: VM to allocate a region on. > > + * @size: Size of the region. > > kerneldoc needs updating for the new arguments. Will fix. > > > + * > > + * Some GPU objects, like heap chunks, are fully managed by the kernel and > > + * need to be mapped to the userspace VM, in the region reserved for kernel > > + * objects. > > + * > > + * This function takes care of allocating a region in this reserved space. > > + * > > + * Return: A valid pointer on success, and ERR_PTR() otherwise. > > Returns an error code not a pointer. And that too. > > > + */ > > +int > > +panthor_vm_alloc_va(struct panthor_vm *vm, u64 va, u64 size, > > + struct drm_mm_node *va_node) > > +{ > > + int ret; > > + > > + if (!size || (size & ~PAGE_MASK)) > > + return -EINVAL; > > + > > + if (va != PANTHOR_VM_KERNEL_AUTO_VA && (va & ~PAGE_MASK)) > > + return -EINVAL; > > + > > + mutex_lock(&vm->mm_lock); > > + if (va != PANTHOR_VM_KERNEL_AUTO_VA) { > > + memset(va_node, 0, sizeof(*va_node)); > > This memset() seems redundant. If we assume the va_node is initialized to zero, it's indeed redundant. I'll update the doc to make this a caller requirement. > I certainly can't see why it's only > required on this path. drm_mm_insert_node_in_range() seems to assign all fields explicitly, while, according to the doc [1], drm_mm_reserve_node() wants the caller to make sure the struct is zero-initialized, except for the start and size fields. > > > + va_node->start = va; > > + va_node->size = size; > > + ret = drm_mm_reserve_node(&vm->mm, va_node); > > + } else { > > + ret = drm_mm_insert_node_in_range(&vm->mm, va_node, size, > > + size >= SZ_2M ? SZ_2M : SZ_4K, > > + 0, vm->kernel_auto_va.start, > > + vm->kernel_auto_va.end, > > + DRM_MM_INSERT_BEST); > > + } > > + mutex_unlock(&vm->mm_lock); > > + > > + return ret; > > +} [...] > > +/* > > + * Only 32 VMs per open file. If that becomes a limiting factor, we can > > + * increase this number. > > + */ > > +#define PANTHOR_MAX_VMS_PER_FILE 32 > > + > > +/** > > + * panthor_vm_pool_create_vm() - Create a VM > > + * @pool: The VM to create this VM on. > > + * @kernel_va_start: Start of the region reserved for kernel objects. > > + * @kernel_va_range: Size of the region reserved for kernel objects. > > + * > > + * Return: 0 on success, a negative error code otherwise. > > Actually returns the (positive) id on success. Will fix. > > > + */ > > +int panthor_vm_pool_create_vm(struct panthor_device *ptdev, > > + struct panthor_vm_pool *pool, > > + struct drm_panthor_vm_create *args) > > +{ > > + u64 kernel_va_start, kernel_va_range; > > + struct panthor_vm *vm; > > + int ret; > > + u32 id; > > + > > + ret = panthor_vm_create_check_args(ptdev, args, &kernel_va_start, &kernel_va_range); > > + if (ret) > > + return ret; > > + > > + vm = panthor_vm_create(ptdev, false, kernel_va_start, kernel_va_range, > > + kernel_va_start, kernel_va_range); > > + if (IS_ERR(vm)) > > + return PTR_ERR(vm); > > + > > + ret = xa_alloc(&pool->xa, &id, vm, > > + XA_LIMIT(1, PANTHOR_MAX_VMS_PER_FILE), GFP_KERNEL); > > + > > + if (ret) { > > + panthor_vm_put(vm); > > + return ret; > > + } > > + > > + args->user_va_range = kernel_va_start; > > + return id; > > +} [...] > > +/** > > + * panthor_vm_put() - Release a reference on a VM > > + * @vm: VM to release the reference on. Can be NULL. > > + */ > > +void panthor_vm_put(struct panthor_vm *vm) > > +{ > > + static_assert(offsetof(struct panthor_vm, base) == 0); > > Yuk! ;) > > I'd prefer: > > drm_gpuvm_put(vm ? &vm->base : NULL); > > which my compiler turns into the same thing rather than relying on the > type punning. You can keep the static_assert if you like, but I don't > like relying on it for correct code generation. Although I'll admit I > couldn't actually get the compiler to produce incorrect code when I tried. Sure, I'll pick your suggestion here. > > > + drm_gpuvm_put(&vm->base); > > +} [...] > > + > > +/** > > + * panthor_vm_prepare_mapped_bos_resvs() - Prepare resvs on VM BOs. > > + * @exec: Locking/preparation context. > > + * @vm: VM targeted by the GPU job. > > + * @slot_count: Number of slots to reserve. > > + * > > + * GPU jobs assume all BOs bound to the VM at the time the job is submitted > > + * are available when the job is executed. In order to guarantee that, we > > + * need to reserve a slot on all BOs mapped to a VM and update this slot with > > + * the job fence after its submission. > > + * > > + * Return: 0 on success, a negative error code otherwise. > > + */ > > +int panthor_vm_prepare_mapped_bos_resvs(struct drm_exec *exec, struct panthor_vm *vm, > > + u32 slot_count) > > +{ > > + int ret; > > + > > + /* Acquire the VM lock an reserve a slot for this GPU job. */ > > + ret = drm_gpuvm_prepare_vm(&vm->base, exec, slot_count); > > + if (ret) > > + return ret; > > + > > + /* VM operations are not protected by the VM resv-lock. We need to > > + * take the op_lock to make sure the shared_bos list is not updated > > + * while we're walking it. > > + */ > > Is the above comment stale? AFAIK the shared_bos list doesn't exist > anymore and this doesn't appear to relate to anything here. Oops, indeed. That predates to transition to drm_gpuvm for the VM <-> BO association. Thanks for the review! [1]https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_mm.c#L441