Hi, Adding Daniel as CC to comment below. On to, 2016-02-18 at 14:22 +0000, John Harrison wrote: > On 20/01/2016 13:18, Joonas Lahtinen wrote: > > On Mon, 2016-01-11 at 18:42 +0000, John.C.Harrison@xxxxxxxxx wrote: > > > From: John Harrison <John.C.Harrison@xxxxxxxxx> > > > <SNIP> > > > > > > + this = node->saved_objects + i; > > > + > > > + for (j = 0; j < test->num_objs; j++) { > > > + that = test->saved_objects + j; > > > + > > > + if (this->obj != that->obj) > > > + continue; > > How about VMAs? There might be multiple mappings to an object, isn't it > > enough to depend on the required VMA instead of the whole object? > The object is what we get coming in from user land through the IOCTL. So > why make things more complicated? If there are multiple VMAs referring > to the same object then we can't just track an individual VMA as that > would loose the dependency on all the other VMAs. Just because the > object is mapped to someone else's address space doesn't mean that this > batch buffer can't overwrite data they are reading. > Right, makes sense. > > > +int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe) > > > +{ > > > + struct drm_i915_private *dev_priv = qe->params.dev->dev_private; > > > + struct i915_scheduler *scheduler = dev_priv->scheduler; > > > + struct intel_engine_cs *ring = qe->params.ring; > > > + struct i915_scheduler_queue_entry *node; > > > + struct i915_scheduler_queue_entry *test; > > > + unsigned long flags; > > > + bool not_flying; > > > + int i, r; > > > + int incomplete = 0; > > > + > > > + WARN_ON(!scheduler); > > > + > > This kind of situations should have a be a BUG_ON, because scheduler > > being zero is literally going to cause an OOPS in the next dereference > > which is going to happen unconditionally. WARN + OOPS is kind of what > > BUG_ON should be used avoid. But this should be removed anyway after > > scheduler is made a data member of dev_priv. > > The WARNs were originally BUGs but Daniel Vetter had the opposite > demand. His view was the driver should never BUG under any > circumstances. A WARN followed by an oops is better than a BUG because > maybe it won't actually oops. > WARN_ON is better than BUG_ON when there won't be an immediate OOPS. But if you're doing a null pointer dereference like here if scheduler is NULL, it is 100% sure it is going to either OOPS or cause horrible undefined behaviour (on non-x86 platform). Something like; if (WARN_ON(!a)) return -ENODEV; Could be what Daniel meant (if the error is propagated up and somehow dealt with), but; WARN_ON(!a); a->foo = bar; Should simply be BUG_ON(!a), because otherwise it'll just end up being WARNING + OOPS right after each other. > > > > > > + if (1/*i915.scheduler_override & i915_so_direct_submit*/) { > > I assume this is going to be addressed in a future commit. Could have > > been introduced in this patch, too. > > > > > + int ret; > > > + > > > + scheduler->flags[qe->params.ring->id] |= i915_sf_submitting; > > > + ret = dev_priv->gt.execbuf_final(&qe->params); > > > + scheduler->flags[qe->params.ring->id] &= ~i915_sf_submitting; > > > + > > The kerneldoc should mention locking requirements of this function. > > > > > + /* > > > + * Don't do any clean up on failure because the caller will > > > + * do it all anyway. > > > + */ > > > + if (ret) > > > + return ret; > > > + > > > + /* Free everything that is owned by the QE structure: */ > > > + kfree(qe->params.cliprects); > > > + if (qe->params.dispatch_flags & I915_DISPATCH_SECURE) > > > + i915_gem_execbuff_release_batch_obj(qe->params.batch_obj); > > > + > > > + return 0; > > Above piece of code looks like its own function, so it should probably > > be one. > > > > > + } > > > + > > > + node = kmalloc(sizeof(*node), GFP_KERNEL); > > > + if (!node) > > > + return -ENOMEM; > > > + > > > + *node = *qe; > > Any reason we can't simply move ownership of qe? If not, I'd rather > > make a clone function > > The qe pointer passed in is a reference to a stack local object in the > execbuff code path. Thus ownership cannot be transferred. Doing it this > way keeps the execbuff code nice and simple and all the dynamic memory > management and list tracking is self contained within the scheduler. > I would indicate this with const qualifier in the function parameter. > > > + > > > +static int i915_scheduler_fly_node(struct i915_scheduler_queue_entry *node) > > > +{ > > > + struct drm_i915_private *dev_priv = node->params.dev->dev_private; > > > + struct i915_scheduler *scheduler = dev_priv->scheduler; > > > + struct intel_engine_cs *ring; > > > + > > > + WARN_ON(!scheduler); > > > + WARN_ON(!node); > > > + WARN_ON(node->status != i915_sqs_popped); > > Other states had their I915_SQS_IS_* macro, why some don't? > The purpose of the macro is to allow the combining of individual states > into classes. E.g. dead and complete can both be considered complete for > the majority of cases. Only in certain situations do you need to know > that it really was dead. Hence most places that don't really care just > use the merged macros, whereas places like this that do care use the > explicit enum value. > Right, just asking cause having a macro might increase consistency. Also here, we have WARN_ON(!node), and then next line dereferencing node, which again is surely better off as BUG_ON. Although it's not unreasonable to expect function to OOPS if you pass null pointer. I think only the higher calling level should have if(WARN_ON(!node)) return ... or BUG_ON construct, before calling this function. Only calls coming from userland need to be treated with such care that anything can be passed, not internal functions. We're not having BUG_ON(!dev) or BUG_ON(!dev_priv) around either. > > > + /* > > > + * In the case where the system is idle, starting 'min_seqno' from a big > > > + * number will cause all nodes to be removed as they are now back to > > > + * being in-order. However, this will be a problem if the last one to > > > + * complete was actually out-of-order as the ring seqno value will be > > > + * lower than one or more completed buffers. Thus code looking for the > > > + * completion of said buffers will wait forever. > > > + * Instead, use the hardware seqno as the starting point. This means > > > + * that some buffers might be kept around even in a completely idle > > > + * system but it should guarantee that no-one ever gets confused when > > > + * waiting for buffer completion. > > > + */ > > > + min_seqno = ring->get_seqno(ring, true); > > > + > > > + list_for_each_entry(node, &scheduler->node_queue[ring->id], link) { > > > + if (I915_SQS_IS_QUEUED(node)) > > > + queued++; > > > + else if (I915_SQS_IS_FLYING(node)) > > > + flying++; > > > + else if (I915_SQS_IS_COMPLETE(node)) > > > + continue; > > > + > > > + if (node->params.request->seqno == 0) > > > + continue; > > > + > > > + if (!i915_seqno_passed(node->params.request->seqno, min_seqno)) > > > + min_seqno = node->params.request->seqno; > > > + } > > Couldn't these values be kept cached, instead of counting them at each > > function? > The 'queued' and flying totals could be kept cached but min_seqno is > dependent upon the state of the hardware so needs to be recalculated. In > which case calculating the totals here is trivial and avoids having > extra code elsewhere to keep them up to date. > Ok. Btw, for_each_scheduler_node or such would be a great macro. Those are very much preferred for readability. $ fgrep for_each_ drivers/gpu/drm/i915/* | wc -l 525 > > > > > + > > > + INIT_LIST_HEAD(&remove); > > > + list_for_each_entry_safe(node, node_next, &scheduler->node_queue[ring->id], link) { > > > + /* > > > + * Only remove completed nodes which have a lower seqno than > > > + * all pending nodes. While there is the possibility of the > > > + * ring's seqno counting backwards, all higher buffers must > > > + * be remembered so that the 'i915_seqno_passed()' test can > > > + * report that they have in fact passed. > > > + * > > > + * NB: This is not true for 'dead' nodes. The GPU reset causes > > > + * the software seqno to restart from its initial value. Thus > > > + * the dead nodes must be removed even though their seqno values > > > + * are potentially vastly greater than the current ring seqno. > > > + */ > > > + if (!I915_SQS_IS_COMPLETE(node)) > > > + continue; > > > + > > > + if (node->status != i915_sqs_dead) { > > > + if (i915_seqno_passed(node->params.request->seqno, min_seqno) && > > > + (node->params.request->seqno != min_seqno)) > > > + continue; > > > + } > > > + > > > + list_del(&node->link); > > > + list_add(&node->link, &remove); > > > + > > > + /* Strip the dependency info while the mutex is still locked */ > > > + i915_scheduler_remove_dependent(scheduler, node); > > > + > > > + continue; > > > + } <SNIP> > > > + do { > > > + WARN_ON(!node); > > > + WARN_ON(node->params.ring != ring); > > > + WARN_ON(node->status != i915_sqs_popped); > > > + count++; > > > + > > > + /* > > > + * The call to pop above will have removed the node from the > > > + * list. So add it back in and mark it as in flight. > > > + */ > > > + i915_scheduler_fly_node(node); > > Why do we want to pull an object out of the list inside spin lock and > > push it back immediately in our critical code path? Seems like a waste > > for no obvious gain at this point. Why do not we rather just select an > > entry and modify it in-place, if it's going to stay in the same queue > > anyway. > The list order is significant. The element must be moved to the front to > keep the submitted items in submission order. Doing it this way also > keeps the code nicely partitioned and easier to understand/maintain. > Plus, there is a plan to optimise the code by splitting the one single > list into three separate ones - queued, flying, complete. If/when that > happens, the element will have to be removed from one list and added to > another. Ok. > > > > > > + > > > + scheduler->flags[ring->id] |= i915_sf_submitting; > > > + spin_unlock_irqrestore(&scheduler->lock, flags); > > > + ret = dev_priv->gt.execbuf_final(&node->params); > > > + spin_lock_irqsave(&scheduler->lock, flags); > > > + scheduler->flags[ring->id] &= ~i915_sf_submitting; > > > + > > > + if (ret) { > > > + int requeue = 1; > > Multipurpose variable, not really a good idea. And as commented > > further, should not exist at all. > > > > > + > > > + /* > > > + * Oh dear! Either the node is broken or the ring is > > > + * busy. So need to kill the node or requeue it and try > > > + * again later as appropriate. > > > + */ > > > + > > > + switch (-ret) { > > > + case ENODEV: > > > + case ENOENT: > > > + /* Fatal errors. Kill the node. */ > > > + requeue = -1; > > > + break; > > "break" indent is wrong. > > > > > + > > > + case EAGAIN: > > > + case EBUSY: > > > + case EIO: > > > + case ENOMEM: > > > + case ERESTARTSYS: > > > + case EINTR: > > > + /* Supposedly recoverable errors. */ > > > + break; > > > + > > > + default: > > > + /* > > > + * Assume the error is recoverable and hope > > > + * for the best. > > > + */ > > > + DRM_DEBUG_DRIVER("<%s> Got unexpected error from execfinal(): %d!\n", > > > + ring->name, ret); > > There's MISSING_CASE macro, should use it. > > > > > + break; > > > + } > > > + > > Just move the code below this point to the switch, no point having a > > switch to categorize your options and then doing bunch of ifs to > > execute code that could be in switch. > One of the 'if' paths is to break out of the while loop. Can't do that > from inside the switch. I do think the code could still be simplified, even if it involved a carefully placed goto. > > > > + /* > > > + * Check that the watchdog/reset code has not nuked > > > + * the node while we weren't looking: > > > + */ > > > + if (node->status == i915_sqs_dead) > > > + requeue = 0; Btw, just noticed, should some locking of node occur? Is it handled gracefully if right after this check, the status is changed, or do we have a race condition here? > -- Joonas Lahtinen Open Source Technology Center Intel Corporation _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx