On Mon, 14 Aug 2023 16:53:09 +0100 Steven Price <steven.price@xxxxxxx> wrote: > > + > > +/** > > + * struct panthor_vm_op_ctx - VM operation context > > + * > > + * With VM operations potentially taking place in a dma-signaling path, we > > + * need to make sure everything that might require resource allocation is > > + * pre-allocated upfront. This is what this operation context is far. > > + * > > + * We also collect resources that have been freed, so we can release them > > + * asynchronously, and let the VM_BIND scheduler process the next VM_BIND > > + * request. > > + */ > > +struct panthor_vm_op_ctx { > > + /** @rsvd_page_tables: Pages reserved for the MMU page table update. */ > > + struct { > > + /** @count: Number of pages reserved. */ > > + u32 count; > > + > > + /** @ptr: Point to the first unused page in the @pages table. */ > > + u32 ptr; > > + > > + /** > > + * @page: Array of pages that can be used for an MMU page table update. > > + * > > + * After an VM operation, there might be free pages left in this array. > > + * They should be returned to the pt_cache as part of the op_ctx cleanup. > > + */ > > + void **pages; > > + } rsvd_page_tables; > > Two questions: > > 1) Would a mempool simplify the implementation? It looks like a > reasonable match. Not sure what you mean by mempool, but I'm using a kmem_cache here for all page table allocations. The pages that are passed to panthor_vm_op_ctx::rsvd_page_tables::pages are allocated from this pool. It's just that for each VM operation we pre-allocate page-tables, and release those that were not used when the operation is done (we over-allocate for the worst case scenario). > > 2) Does it really make sense to have a separate pool of memory for every > operation? Instead of having a separate pool for each operation, it > would be possible to just keep track of the total number needed for all > outstanding operations. Then a single (per device or maybe per-VM if > necessary) mempool could be resized to ensure it has the right amount of > space. The pool is per-driver (see the global pt_cache). rsvd_page_tables just holds pages needed for a specific VM operation. To be more specific, it holds pages for the worst case (page table tree is empty, except for the root page table). > > I'm also a little wary that the VM_BIND infrastructure could potentially > be abused to trigger a large amount of kernel allocation as it allocates > up-front for the worst case but those pages are not charged to the > process (AFAICT). But I haven't fully got my head round that yet. Yep, that's problematic, indeed. I considered allocating page tables as GEM objects, but the overhead of a GEM object is quite big (hundreds of bytes of meta-data) compared to the size of a page table (4k), and kmem_cache was just super convenient for this page table cache :-). > > > + > > + /** @flags: Combination of drm_panthor_vm_bind_op_flags. */ > > + u32 flags; > > + > > + /** @va: Virtual range targeted by the VM operation. */ > > + struct { > > + /** @addr: Start address. */ > > + u64 addr; > > + > > + /** @range: Range size. */ > > + u64 range; > > + } va; > > + > > + /** > > + * @returned_vmas: List of panthor_vma objects returned after a VM operation. > > + * > > + * For unmap operations, this will contain all VMAs that were covered by the > > + * specified VA range. > > + * > > + * For map operations, this will contain all VMAs that previously mapped to > > + * the specified VA range. > > + * > > + * Those VMAs, and the resources they point to will be released as part of > > + * the op_ctx cleanup operation. > > + */ > > + struct list_head returned_vmas; > > + > > + /** @map: Fields specific to a map operation. */ > > + struct { > > + /** @gem: GEM object information. */ > > + struct { > > + /** @obj: GEM object to map. */ > > + struct drm_gem_object *obj; > > + > > + /** @offset: Offset in the GEM object. */ > > + u64 offset; > > + } gem; > > + > > + /** > > + * @sgt: sg-table pointing to pages backing the GEM object. > > + * > > + * This is gathered at job creation time, such that we don't have > > + * to allocate in ::run_job(). > > + */ > > + struct sg_table *sgt; > > + > > + /** > > + * @prev_vma: Pre-allocated VMA object to deal with a remap situation. > > + * > > + * If the map request covers a region that's inside another VMA, the > > + * previous VMA will be split, requiring instantiation of a maximum of > > + * two new VMA objects. > > + */ > > + struct panthor_vma *prev_vma; > > + > > + /** > > + * @new_vma: The new VMA object that will be inserted to the VA tree. > > + */ > > + struct panthor_vma *new_vma; > > + > > + /** > > + * @next_vma: Pre-allocated VMA object to deal with a remap situation. > > + * > > + * See @prev_vma. > > + */ > > + struct panthor_vma *next_vma; > > It's probably premature optimization, but it feels like having a cache > of these VMA structures might be an idea. If it's needed, I'll probably go for a kmem_cache, but I need to check if it's worth it first (if the closest kmalloc cache is significantly biffer than the struct size). > I'm also struggling to > understand how both a new prev and new next VMA are needed - but I > haven't dug into the GPU VA manager. prev/next are for mapping splits: an object is already mapped, and a new object is mapped in the middle of this pre-existing mapping. In that case, we need 2 vma object for the preceeding and succeeding mappings, since the old mapping object will be released. new_vma is for the new mapping. > > > + } map; > > +}; > > + [...] > > +/** > > + * panthor_vm_active() - Flag a VM as active > > + * @VM: VM to flag as active. > > + * > > + * Assigns an address space to a VM so it can be used by the GPU/MCU. > > + * > > + * Return: 0 on success, a negative error code otherwise. > > + */ > > +int panthor_vm_active(struct panthor_vm *vm) > > +{ > > + struct panthor_device *ptdev = vm->ptdev; > > + struct io_pgtable_cfg *cfg = &io_pgtable_ops_to_pgtable(vm->pgtbl_ops)->cfg; > > + int ret = 0, as, cookie; > > + u64 transtab, transcfg; > > + > > + if (!drm_dev_enter(&ptdev->base, &cookie)) > > + return -ENODEV; > > + > > + mutex_lock(&ptdev->mmu->as.slots_lock); > > + > > + as = vm->as.id; > > + if (as >= 0) { > > + u32 mask = panthor_mmu_as_fault_mask(ptdev, as); > > + > > + if (ptdev->mmu->as.faulty_mask & mask) { > > + /* Unhandled pagefault on this AS, the MMU was > > + * disabled. We need to re-enable the MMU after > > + * clearing+unmasking the AS interrupts. > > + */ > > + gpu_write(ptdev, MMU_INT_CLEAR, mask); > > + ptdev->mmu->as.faulty_mask &= ~mask; > > + gpu_write(ptdev, MMU_INT_MASK, ~ptdev->mmu->as.faulty_mask); > > + goto out_enable_as; > > + } > > + > > + goto out_unlock; > > + } > > + > > + /* Check for a free AS */ > > + if (vm->for_mcu) { > > + drm_WARN_ON(&ptdev->base, ptdev->mmu->as.alloc_mask & BIT(0)); > > + as = 0; > > + } else { > > + as = ffz(ptdev->mmu->as.alloc_mask | BIT(0)); > > + } > > + > > + if (!(BIT(as) & ptdev->gpu_info.as_present)) { > > + struct panthor_vm *lru_vm; > > + > > + lru_vm = list_first_entry_or_null(&ptdev->mmu->as.lru_list, > > + struct panthor_vm, > > + as.lru_node); > > + if (drm_WARN_ON(&ptdev->base, !lru_vm)) { > > + ret = -EBUSY; > > + goto out_unlock; > > + } > > + > > + list_del_init(&lru_vm->as.lru_node); > > + as = lru_vm->as.id; > > Should this not set lru_vm->as.id = -1, so that the code knows the VM no > longer has an address space? Good catch! > > > + } else { > > + set_bit(as, &ptdev->mmu->as.alloc_mask); > > + } > > + > > + /* Assign the free or reclaimed AS to the FD */ > > + vm->as.id = as; > > + ptdev->mmu->as.slots[as].vm = vm; > > + > > +out_enable_as: > > + transtab = cfg->arm_lpae_s1_cfg.ttbr; > > + transcfg = AS_TRANSCFG_PTW_MEMATTR_WB | > > + AS_TRANSCFG_PTW_RA | > > + AS_TRANSCFG_ADRMODE_AARCH64_4K; > > + if (ptdev->coherent) > > + transcfg |= AS_TRANSCFG_PTW_SH_OS; > > + > > + ret = panthor_mmu_as_enable(vm->ptdev, vm->as.id, transtab, transcfg, vm->memattr); > > + > > +out_unlock: > > + mutex_unlock(&ptdev->mmu->as.slots_lock); > > + drm_dev_exit(cookie); > > + return ret; > > +} > > + [...] > > + > > +static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32 status) > > +{ > > + status = panthor_mmu_fault_mask(ptdev, status); > > + while (status) { > > + u32 as = ffs(status | (status >> 16)) - 1; > > + u32 mask = panthor_mmu_as_fault_mask(ptdev, as); > > + u32 new_int_mask; > > + u64 addr; > > + u32 fault_status; > > + u32 exception_type; > > + u32 access_type; > > + u32 source_id; > > + > > + fault_status = gpu_read(ptdev, AS_FAULTSTATUS(as)); > > + addr = gpu_read(ptdev, AS_FAULTADDRESS_LO(as)); > > + addr |= (u64)gpu_read(ptdev, AS_FAULTADDRESS_HI(as)) << 32; > > + > > + /* decode the fault status */ > > + exception_type = fault_status & 0xFF; > > + access_type = (fault_status >> 8) & 0x3; > > + source_id = (fault_status >> 16); > > + > > + /* Page fault only */ > > This comment makes no sense - it looks like it's copied over from panfrost. Uh, it made sense before I dropped map/alloc-on-fault :-). > > If I understand correctly we don't (currently) support growing on page > fault - and it's not really needed now the MCU can handle the tiler heaps. Exaclty. Map/alloc on fault is a bit challenging because of the whole 'we have to guarantee that a job is done in finite time, and we must make sure fence signaling is not blocked on allocation'. Given drm_gem_get_pages() doesn't do non-blocking allocations, I thought it'd be preferable to postpone map-on-fault until we actually decide we need it. Note that i915 seems to have some sort of non-blocking page allocator in shmem_sg_alloc_table()[1]. > > > + mutex_lock(&ptdev->mmu->as.slots_lock); > > + > > + new_int_mask = > > + panthor_mmu_fault_mask(ptdev, ~ptdev->mmu->as.faulty_mask); > > + > > + /* terminal fault, print info about the fault */ > > + drm_err(&ptdev->base, > > + "Unhandled Page fault in AS%d at VA 0x%016llX\n" > > + "raw fault status: 0x%X\n" > > + "decoded fault status: %s\n" > > + "exception type 0x%X: %s\n" > > + "access type 0x%X: %s\n" > > + "source id 0x%X\n", > > + as, addr, > > + fault_status, > > + (fault_status & (1 << 10) ? "DECODER FAULT" : "SLAVE FAULT"), > > + exception_type, panthor_exception_name(ptdev, exception_type), > > + access_type, access_type_name(ptdev, fault_status), > > + source_id); > > + > > + /* Ignore MMU interrupts on this AS until it's been > > + * re-enabled. > > + */ > > + ptdev->mmu->irq.mask = new_int_mask; > > + gpu_write(ptdev, MMU_INT_MASK, new_int_mask); > > + > > + /* Disable the MMU to kill jobs on this AS. */ > > + panthor_mmu_as_disable(ptdev, as); > > + mutex_unlock(&ptdev->mmu->as.slots_lock); > > + > > + status &= ~mask; > > + } > > +} > > +PANTHOR_IRQ_HANDLER(mmu, MMU, panthor_mmu_irq_handler); > > + [...] > > + > > +/** > > + * panthor_mmu_unplug() - Unplug the MMU logic > > + * @ptdev: Device. > > + * > > + * No access to the MMU regs should be done after this function is called. > > + * We suspend the IRQ and disable all VMs to guarantee that. > > + */ > > +void panthor_mmu_unplug(struct panthor_device *ptdev) > > +{ > > + if (ptdev->mmu->irq.irq > 0) > > In what situation is this not true? AFAICT the driver probe will fail if > the IRQ can't be obtained. Right, I'll drop this test. [1]https://elixir.bootlin.com/linux/v6.5/source/drivers/gpu/drm/i915/gem/i915_gem_shmem.c#L63