On Die, 2010-11-09 at 11:07 +0100, Thomas Hellstrom wrote: > On 11/09/2010 10:53 AM, Thomas Hellstrom wrote: > > On 11/09/2010 10:29 AM, Markus Trippelsdorf wrote: > >> OK I've found the buggy commit by bisection: > >> > >> e376573f7267390f4e1bdc552564b6fb913bce76 is the first bad commit > >> commit e376573f7267390f4e1bdc552564b6fb913bce76 > >> Author: Michel DÃnzer<daenzer@xxxxxxxxxx> > >> Date: Thu Jul 8 12:43:28 2010 +1000 > >> > >> drm/radeon: fall back to GTT if bo creation/validation in VRAM > >> fails. > >> > >> This fixes a problem where on low VRAM cards we'd run out of > >> space for validation. > >> > >> [airlied: Tested on my M7, Thinkpad T42, compiz works with no > >> problems.] > >> > >> Signed-off-by: Michel DÃnzer<daenzer@xxxxxxxxxx> > >> Cc: stable@xxxxxxxxxx > >> Signed-off-by: Dave Airlie<airlied@xxxxxxxxxx> > >> > >> Please note that this is an old commit from 2.6.36-rc. When I revert > >> it the > >> kernel no longer crashes. Instead I see the following in my dmesg: > >> > > > > Hmm, so this sounds like something in the Radeon eviction error path > > is causing corruption. > > I had a similar problem with vmwgfx, when I tried to unref a BO > > _after_ ttm_bo_init() failed. > > ttm_bo_init() is really supposed to call unref itself for various > > reasons, so calling unref() or kfree() after a failed ttm_bo_init() > > will cause corruption. > > > > In any case, the error below also suggests something is a bit fragile > > in the Radeon driver: > > > > First, an accelerated eviction may fail, like in the message below, > > but then there must always be a backup plan, like unaccelerated > > eviction to system. On BO creation, there are a number of placement > > strategies, but if all else fails, it should be possible to initially > > place the BO in system memory. > > > > Second, If bo validation fails during a command submission, due to > > insufficient VRAM / TT, then the driver should retry the complete > > validation cycle after first blocking all other validators and then > > evicting everything not pinned, to avoid failures due to fragmentation. > > > > /Thomas > > > > Indeed, it seems like the commit you mention just retries ttm_bo_init() > after it previously failed. At that point the bo has been destroyed, so > that is probably what's causing the BUG you are seeing. > > Admittedly, ttm_bo_init() calling unref on failure is not properly > documented in the function description. The reason for doing so is to > have a single path for freeing all BO resources already allocated on the > point of failure. Does the patch below fix the problem? commit e224472eedbda391ddb6d8b88f26e82e1c3b036b Author: Michel DÃnzer <daenzer@xxxxxxxxxx> Date: Tue Nov 9 11:30:41 2010 +0100 drm/radeon/kms: Fix retrying ttm_bo_init() after it failed once. If ttm_bo_init() returns failure, it already destroyed the BO, so we need to retry from scratch. Signed-off-by: Michel DÃnzer <daenzer@xxxxxxxxxx> Cc: stable@xxxxxxxxxx diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c index 1b9004e..bbe92d5 100644 --- a/drivers/gpu/drm/radeon/radeon_object.c +++ b/drivers/gpu/drm/radeon/radeon_object.c @@ -102,6 +102,8 @@ int radeon_bo_create(struct radeon_device *rdev, struct drm_gem_object *gobj, type = ttm_bo_type_device; } *bo_ptr = NULL; + +retry: bo = kzalloc(sizeof(struct radeon_bo), GFP_KERNEL); if (bo == NULL) return -ENOMEM; @@ -109,8 +111,6 @@ int radeon_bo_create(struct radeon_device *rdev, struct drm_gem_object *gobj, bo->gobj = gobj; bo->surface_reg = -1; INIT_LIST_HEAD(&bo->list); - -retry: radeon_ttm_placement_from_domain(bo, domain); /* Kernel allocation are uninterruptible */ mutex_lock(&rdev->vram_mutex); -- Earthling Michel DÃnzer | http://www.vmware.com Libre software enthusiast | Debian, X and DRI developer _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel