On Wed, 2012-01-25 at 09:39 +0100, Thomas Hellstrom wrote: > On 01/25/2012 09:05 AM, Ben Skeggs wrote: > > On Wed, 2012-01-25 at 08:43 +0100, Thomas Hellstrom wrote: > >> On 01/25/2012 06:34 AM, Ben Skeggs wrote: > >>> From: Ben Skeggs<bskeggs@xxxxxxxxxx> > >>> > >>> Both changes in dc97b3409a790d2a21aac6e5cdb99558b5944119 cause serious > >>> regressions in the nouveau driver. > >>> > >>> move_notify() was originally able to presume that bo->mem is the old node, > >>> and new_mem is the new node. The above commit moves the call to > >>> move_notify() to after move() has been done, which means that now, sometimes, > >>> new_mem isn't the new node at all, bo->mem is, and new_mem points at a > >>> stale, possibly-just-been-killed-by-move node. > >>> > >>> This is clearly not a good situation. This patch reverts this change, and > >>> replaces it with a cleanup in the move() failure path instead. > >> While I have nothing against having move_notify() happening before the > >> move, but > >> I still very much dislike the failure path thing, since it puts a > >> requirement on TTM to support > >> driver moves through move_notify(), which is done by Radeon. > >> I've kindly asked Jerome to change that, stating a number of reasons, > >> but he refuses, and I'm > >> not prepared to let support for that mode of operation sneak in like this. > > What requirement is there? All it's asking the driver is to do a > > move_notify with old/new nodes swapped, which would effectively undo the > > previous move_notify(). > > > > move_notify() *cannot* happen after the move for the reasons I mentioned > > already, so the choice is apparently either to completely ignore > > cleaning up if the subsequent move() fails (previous behaviour prior to > > the patch which caused these regressions), or to make the change I > > did... > > I agree. What I'm proposing is that we should go through with the first > part of > your patch and ignore the cleanup. > > > > >> > >> The second issue is that the call to move_notify() from cleanup_memtype_use() > >> causes the TTM ghost objects to get passed into the driver. This is clearly > >> bad as the driver knows nothing about these "fake" TTM BOs, and ends up > >> accessing uninitialised memory. > >> > >> I worked around this in nouveau's move_notify() hook by ensuring the BO > >> destructor was nouveau's. I don't particularly like this solution, and > >> would rather TTM never pass the driver these objects. However, I don't > >> clearly understand the reason why we're calling move_notify() here anyway > >> and am happy to work around the problem in nouveau instead of breaking the > >> behaviour expected by other drivers. > >> > >> > >> move_notify() here gives the driver a chance to unbind any GPU maps > >> remaining on the BO > >> before it changes placement or is destroyed. > >> We've previously required the driver to support any type of BO in the > >> driver hooks, I agree > >> with you that it would be desirable to make that go away. > > Okay, that's fine I guess. As the commit says, I'm happy to make > > nouveau just ignore operations on BOs it doesn't "own". > > > > I know *you* think of move_notify() as only being around to deal with > > unmapping, but as I mentioned previously, it'd have never taken the new > > node as a parameter if this is were the case. > That's not exactly true. After you pointed that out, I went back and > check the > old vmwgfx use, and the new node was used to determine whether > an unbind was actually necessary, or whether it could leave the gpu map > untouched. Ok, but even so, prior to these changes there was no good reason nor mention in any of the headers/code that drivers weren't supposed to assume "new_mem" was anything useful.. > > > I have zero issue with > > nouveau using it to set up the GPU mappings currently. I know you've > > suggested alternatives previously, which may be possible down the track > > if it's *really* necessary, but it seems like a bad idea to pursue this > > for 3.3. > > My main concern is that we blindly and unnecessarily set up GPU bindings and > end up with unnecessary code in TTM, and furthermore that we communicate > that bad practice to future driver writers. This "unnecessary code" is like 5 lines of cleanup if something fails, hardly anything to be jumping up and down about :) > > Thomas, what do you suggest to move forward with this? Both of these > bugs are serious regressions that make nouveau unusable with the current > 3.3-rc series. Ben. > > My number one choice would of course be to have the drivers set up their > private GPU mappings after a > successful validate call, as I originally suggested and you agreed to. > > If that's not possible (I realize it's late in the release series), I'll > ack this patch if you and Jerome agree not to block > attempts to move in that direction for future kernel releases. I can't say I'm entirely happy with the plan honestly. To me, it still seems more efficient to handle this when a move happens (comparatively rare) and "map new backing storage into every vm that has a reference" than to (on every single buffer of every single "exec" call) go "is this buffer mapped into this channel's vm? yes, ok; no, lets go map it". I'm not even sure how exactly I plan on storing this mapping efficiently yet.. Scanning the BO's linked list of VMs it's mapped into for "if (this_vma == chan->vma)" doesn't exactly sound performant. Thanks, Ben. > > Thanks, > Thomas > > > > >> Signed-off-by: Ben Skeggs<bskeggs@xxxxxxxxxx> > >> Cc: Jerome Glisse<j.glisse@xxxxxxxxx> > >> --- > >> drivers/gpu/drm/nouveau/nouveau_bo.c | 4 ++++ > >> drivers/gpu/drm/ttm/ttm_bo.c | 17 +++++++++++++---- > >> 2 files changed, 17 insertions(+), 4 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c > >> index 724b41a..ec54364 100644 > >> --- a/drivers/gpu/drm/nouveau/nouveau_bo.c > >> +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c > >> @@ -812,6 +812,10 @@ nouveau_bo_move_ntfy(struct ttm_buffer_object *bo, struct ttm_mem_reg *new_mem) > >> struct nouveau_bo *nvbo = nouveau_bo(bo); > >> struct nouveau_vma *vma; > >> > >> + /* ttm can now (stupidly) pass the driver bos it didn't create... */ > >> + if (bo->destroy != nouveau_bo_del_ttm) > >> + return; > >> + > >> list_for_each_entry(vma,&nvbo->vma_list, head) { > >> if (new_mem&& new_mem->mem_type == TTM_PL_VRAM) { > >> nouveau_vm_map(vma, new_mem->mm_node); > >> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c > >> index 2f0eab6..7c3a57d 100644 > >> --- a/drivers/gpu/drm/ttm/ttm_bo.c > >> +++ b/drivers/gpu/drm/ttm/ttm_bo.c > >> @@ -404,6 +404,9 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo, > >> } > >> } > >> > >> + if (bdev->driver->move_notify) > >> + bdev->driver->move_notify(bo, mem); > >> + > >> if (!(old_man->flags& TTM_MEMTYPE_FLAG_FIXED)&& > >> !(new_man->flags& TTM_MEMTYPE_FLAG_FIXED)) > >> ret = ttm_bo_move_ttm(bo, evict, no_wait_reserve, no_wait_gpu, mem); > >> @@ -413,11 +416,17 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo, > >> else > >> ret = ttm_bo_move_memcpy(bo, evict, no_wait_reserve, no_wait_gpu, mem); > >> > >> - if (ret) > >> - goto out_err; > >> + if (ret) { > >> + if (bdev->driver->move_notify) { > >> + struct ttm_mem_reg tmp_mem = *mem; > >> + *mem = bo->mem; > >> + bo->mem = tmp_mem; > >> + bdev->driver->move_notify(bo, mem); > >> + bo->mem = *mem; > >> + } > >> > >> - if (bdev->driver->move_notify) > >> - bdev->driver->move_notify(bo, mem); > >> + goto out_err; > >> + } > >> > >> moved: > >> if (bo->evicted) { > >> > >> > >> > >> > > > > > _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel