On Mon, Jan 27, 2014 at 09:56:12AM -0800, Volkin, Bradley D wrote: > > +static void > > +i915_mmu_notifier_del(struct i915_mmu_notifier *mmu, > > + struct i915_mmu_object *mn) > > +{ > > + bool destroy; > > + > > + spin_lock(&mmu->lock); > > + interval_tree_remove(&mn->it, &mmu->objects); > > + destroy = --mmu->count == 0; > > + __i915_mmu_notifier_update_serial(mmu); > > + spin_unlock(&mmu->lock); > > + > > + if (destroy) /* protected against _add() by struct_mutex */ > > + __i915_mmu_notifier_destroy(mmu); > > I see that we should hold struct_mutex when this function is called, > but I don't see that we try to get the mutex anywhere on the _add() path. > Have I missed something? No, it's fixed in a later patch. I assumed I had the lock taken in the outermost ioctl routine. > > +free_mn: > > + kfree(mn); > > +destroy_mmu: > > + if (mmu->count == 0) > > + __i915_mmu_notifier_destroy(mmu); > > Other accesses to mmu->count are protected by mmu->lock. Again, I may have missed > something but don't immediately see why that's not required. mmu->count is protected by struct_mutex. See above. > > + if (pinned < num_pages) { > > + if (pinned < 0) { > > + ret = pinned; > > + pinned = 0; > > + } else { > > + /* Spawn a worker so that we can acquire the > > + * user pages without holding our mutex. > > + */ > > + ret = -EAGAIN; > > + if (obj->userptr.work == NULL) { > > + struct get_pages_work *work; > > + > > + work = kmalloc(sizeof(*work), GFP_KERNEL); > > + if (work != NULL) { > > + obj->userptr.work = &work->work; > > + > > + work->obj = obj; > > + drm_gem_object_reference(&obj->base); > > + > > + work->task = current; > > + get_task_struct(work->task); > > + > > + INIT_WORK(&work->work, __i915_gem_userptr_get_pages_worker); > > + schedule_work(&work->work); > > Any reason to use the system wq instead of the driver wq here? > It doesn't look like it's the usual "takes modeset locks" justification. Performance. Using the driver workqueue would serialise the clients, but using the system workqueue we can do the pagefaulting in parallel. > > > + } else > > + ret = -ENOMEM; > > + } else { > > + if (IS_ERR(obj->userptr.work)) { > > } else if (...) { ? No, I think it is clearer as is. > > + ret = PTR_ERR(obj->userptr.work); > > + obj->userptr.work = NULL; > > + } > > + } > > + } > > + } else { > > + struct sg_table *st; > > + > > + st = kmalloc(sizeof(*st), GFP_KERNEL); > > + if (st == NULL || sg_alloc_table(st, num_pages, GFP_KERNEL)) { > > + kfree(st); > > + ret = -ENOMEM; > > + } else { > > + struct scatterlist *sg; > > + int n; > > + > > + for_each_sg(st->sgl, sg, num_pages, n) > > + sg_set_page(sg, pvec[n], PAGE_SIZE, 0); > > + > > + obj->pages = st; > > + obj->userptr.work = NULL; > > + > > + pinned = 0; > > + ret = 0; > > + } > > This block is almost identical to code in the worker. Would it be worth extracting > the common parts into a helper? Almost, but subtly and importantly different. Only the loop was the same at which point I didn't feel like the saving was significant. It is now even less similar. > > + } > > + > > + release_pages(pvec, pinned, 0); > > + drm_free_large(pvec); > > + return ret; > > +} > > + > > +static void > > +i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj) > > +{ > > + struct scatterlist *sg; > > + int i; > > + > > + if (obj->madv != I915_MADV_WILLNEED) > > + obj->dirty = 0; > > This is subtly different than similar code in the standard put_pages() in that > it sets dirty=0 for both DONTNEED and PURGED instead of just DONTNEED (w/ BUG_ON(PURGED)). > I don't think we will ever actually truncate userptr objects, so is there any reason for > this to be different? No, there's no reason for the difference. Semantically it is the same, of course. > > +/** > > + * Creates a new mm object that wraps some normal memory from the process > > + * context - user memory. > > + * > > + * We impose several restrictions upon the memory being mapped > > + * into the GPU. > > + * 1. It must be page aligned (both start/end addresses, i.e ptr and size). > > + * 2. We only allow a bo as large as we could in theory map into the GTT, > > + * that is we limit the size to the total size of the GTT. > > + * 3. The bo is marked as being snoopable. The backing pages are left > > + * accessible directly by the CPU, but reads by the GPU may incur the cost > > + * of a snoop (unless you have an LLC architecture). > > No overlapping ranges Yes, that is an important addition. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx