On 29/08/2023 18:46, Boris Brezillon wrote: > On Mon, 21 Aug 2023 12:31:29 +0100 > Steven Price <steven.price@xxxxxxx> wrote: > >> On 09/08/2023 17:53, Boris Brezillon wrote: [...] >>> + * // Collect signal operations on all jobs, such that each job can pick >>> + * // from it for its dependencies and update the fence to signal when >>> + * // the job is submitted. >> >> I can't figure out here how we avoid depedency loops within a batch. >> What stops two jobs from each depending on each other? >> >> Or do we "allow" this but rely on the loop in panthor_submit_ctx_add_deps_and_arm_jobs() >> to effectively enforce that a job cannot actually depend on a job >> which is later in the batch. > > You can't have circular dependencies because the job fence is created > after its dependencies have been registered, so a job at the beginning > of the array can't depend on a job that's coming after. It might be > passed the same syncobj, but since a syncobj is just a container, the > fence attached to the syncobj at the time the first job adds it as a > dependency will point to a different dma_fence. > >> In which case why bother with this >> complexity rather than just performing all the steps on each job >> in order? > > Because, before submitting a set of jobs, we want to make sure all jobs > that are passed to a submit request are valid and enough resources are > available for their execution to proceed. We could allow partial > execution (and that's actually the approach I had taken in one of the > patch I proposed to allow submitting multiple jobs in one call to > panfrost), but then you potentially have to figure out where things > failed, not to mention that the syncobjs might point to intermediate > dma_fence objects instead of the final one. > >> >> Being able to submit a forward dependency, but then having it >> ignored seems like an odd design. So I feel like I must be >> missing something. > > It's not about allowing forward dependencies (that would be mess), but > allowing one job to take a dependency on a job that was appearing > earlier in the job array of the same submit call. > >> >>> + * ret = panthor_submit_ctx_collect_jobs_signal_ops(&ctx); > > Here panthor_submit_ctx_collect_jobs_signal_ops() is not registering > job out_fences to the syncobjs, it's just collecting all signal > operations from all jobs in an array. Each entry in this array contains > the syncobj handle, the syncobj object, and the fence that was attached > to it at the time the collection happens, and that's it. > > Now, when a job are populated, and after we made sure it had > everything it needs to be submitted, for each signal operation passed > to this specific job, we update the corresponding entry in the signal > array with the job finished fence, but the syncobj is not updated at > that point, because we want to make sure all jobs belonging to a submit > can be submitted before exposing their fences to the outside world. > > For jobs happening later in the array, when we see a WAIT operation, > we will first check the signal array to see if there's a > corresponding entry cached there for the given syncobj handle, if there > is, we take the dma_fence from here (this dma_fence might come from a > job submitted earlier in this submit context, or it might be the fence > that was there initially), if not, we call drm_syncobj_find_fence() to > get the dependency. > > Once all jobs have been parsed/checked/populated, we start the > non-failing step => job submission. And after that point, we can start > exposing the job fences to the outside world. This is what happens in > panthor_submit_ctx_push_fences(): we iterate over the signal > operations, and update each syncobj with the fence that was last > attached to it (the last job in the submit array having a SIGNAL > operation on that syncobj). Thanks for the detailed explanation. I guess I hadn't considered the benefits of checking everything is valid and obtaining resources before submitting anything. That makes sense and I guess justifies this complexity. [...] >>> +static int >>> +panthor_submit_ctx_add_job(struct panthor_submit_ctx *ctx, u32 idx, >>> + struct drm_sched_job *job, >>> + const struct drm_panthor_obj_array *syncs) >>> +{ >>> + struct panthor_device *ptdev = container_of(ctx->file->minor->dev, >>> + struct panthor_device, >>> + base); >>> + int ret; >>> + >>> + if (drm_WARN_ON(&ptdev->base, >>> + idx >= ctx->job_count || >>> + ctx->jobs[idx].job || >>> + ctx->jobs[idx].syncops || >>> + ctx->jobs[idx].syncop_count)) >>> + return -EINVAL; >>> + >>> + ctx->jobs[idx].job = job; >> >> While the WARN_ON obviously shouldn't happen, this positioning of the >> ctx->jobs[].job assignment means the caller has no idea if the >> assignment has happened. AFAICT in the case of the WARN_ON the job isn't >> cleaned up properly. > > It's not really about cleanup not happening, more about being passed an > index that was already populated. > >> >> The options I can see are to move this line further down (and make the >> caller clean up that one job if this function fails), or to clean up the >> job in the case where the WARN_ON fails. > > Maybe I should drop this WARN_ON() and assume the caller passed a valid > index... I'd be fine with that. My reordering suggestion is a bit pointless I must admit ;) [...] >>> + >>> + for (u32 i = 0; i < sync_op_count; i++) { >>> + struct panthor_sync_signal *sig_sync; >>> + struct dma_fence *fence; >>> + >>> + if (sync_ops[i].flags & DRM_PANTHOR_SYNC_OP_SIGNAL) >>> + continue; >> >> NIT: It might be worth having a helper for the operation type. It's a >> little confusing that we have !(flags & SIGNAL) and (flags & SIGNAL) but >> not (flags & WAIT) - obviously looking at the definition shows why. Also >> there'll be a lot of careful refactoring needed if a third operation is >> ever added. > > I had the operation as a separate field initially, but I couldn't think > of any other operations we could do on a syncobj, so I decided to make > it a flag, and mimic what Xe does. A flag is fine, I just find it harder to read: if (sync_ops[i].flags & DRM_PANTHOR_SYNC_OP_SIGNAL) [...] if (!(sync_ops[i].flags & DRM_PANTHOR_SYNC_OP_SIGNAL) vs bool is_signal_op(struct drm_panthor_sync_op *op) { return !!(op->flags & DRM_PANTHOR_SYNC_OP_SIGNAL); } bool is_wait_op(struct drm_panthor_sync_op *op) { return !(op->flags & DRM_PANTHOR_SYNC_OP_SIGNAL); } if (is_signal_op(&sync_ops[i])) [...] if (is_wait_op(&sync_ops[i])) And it avoid anyone accidentally writing: if (sync_ops[i].flags & DRM_PANTHOR_SYNC_OP_WAIT) which in my quick test the compiler doesn't even warn on :( Although on the subject of the flag, apparently the enumeration type value doesn't compile with -pedantic as it overflows into the sign bit: include/drm/panthor_drm.h:237:31: warning: enumerator value for ‘DRM_PANTHOR_SYNC_OP_SIGNAL’ is not an integer constant expression [-Wpedantic] 237 | DRM_PANTHOR_SYNC_OP_SIGNAL = 1 << 31, Changing it to "(int)(1u << 31)" seems to be workaround. This affects DRM_PANTHOR_VM_BIND_OP_TYPE_MASK too. >> [...] >>> +#define PANTHOR_VM_CREATE_FLAGS 0 >>> + >>> +static int panthor_ioctl_vm_create(struct drm_device *ddev, void *data, >>> + struct drm_file *file) >>> +{ >>> + struct panthor_device *ptdev = container_of(ddev, struct panthor_device, base); >>> + u32 va_bits = GPU_MMU_FEATURES_VA_BITS(ptdev->gpu_info.mmu_features); >>> + struct panthor_file *pfile = file->driver_priv; >>> + struct drm_panthor_vm_create *args = data; >>> + u64 kernel_va_start = 0; >>> + int cookie, ret; >>> + >>> + if (!drm_dev_enter(ddev, &cookie)) >>> + return -ENODEV; >>> + >>> + if (args->flags & ~PANTHOR_VM_CREATE_FLAGS) { >>> + ret = -EINVAL; >>> + goto out_dev_exit; >>> + } >>> + >>> + if (drm_WARN_ON(ddev, !va_bits) || args->kernel_va_range > (1ull << (va_bits - 1))) { >> >> The check for !va_bits would be better done at probe time. I'd also be >> tempted to move the change for kernel_va_range down to >> panthor_vm_create() as that has to repeat the va_bits calculation. >> >>> + ret = -EINVAL; >>> + goto out_dev_exit; >>> + } >>> + >>> + if (args->kernel_va_range) >>> + kernel_va_start = (1 << (va_bits - 1)) - args->kernel_va_range; >> >> And also push the calculation of va_start down to >> panthor_vm_create() as well. > > panthor_vm_create() is used internally, for the MCU VM creation, and > I'd prefer to keep it uAPI agnostic. I don't mind moving it to > panthor_vm_pool_create_vm() but we'd still have to do the va_bits > calculation twice. Ah true, for panthor_vm_create() you need to be able to pass in the VA range for the MCU. We do have the "for_mcu" flag so the CSF_MCU_SHARED_REGION_START/SIZE #defines could be used directly in panthor_vm_create(). But I'd be happy with it in panthor_vm_pool_create_vm() if you'd prefer. Steve