Hi Ville, On Tue, Oct 13, 2015 at 06:04:40PM +0300, Ville Syrjälä wrote: > On Tue, Jun 30, 2015 at 10:06:27AM +0100, Lukas Wunner wrote: > > From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > > > We had two failure modes here: > > > > 1. > > Deadlock in intelfb_alloc failure path where it calls > > drm_framebuffer_remove, which grabs the struct mutex and intelfb_create > > (caller of intelfb_alloc) was already holding it. > > > > 2. > > Deadlock in intelfb_create failure path where it calls > > drm_framebuffer_unreference, which grabs the struct mutex and > > intelfb_create was already holding it. > > > > v2: > > * Reformat commit msg to 72 chars. (Lukas Wunner) > > * Add third failure mode. (Lukas Wunner) > > > > v3: > > * On fb alloc failure, unref gem object where it gets refed, > > fix double unref in separate commit. (Ville Syrjälä) > > > > v4: > > * Lock struct mutex on unref. (Chris Wilson) > > > > v5: > > * Rebase on drm-intel-nightly 2015y-09m-04d-08h-19m-35s UTC, > > rephrase commit message. (Jani Nicula) > > > > Tested-by: Pierre Moreau <pierre.morrow@xxxxxxx> > > [MBP 5,3 2009 nvidia 9400M + 9600M GT pre-retina] > > Tested-by: Paul Hordiienko <pvt.gord@xxxxxxxxx> > > [MBP 6,2 2010 intel ILK + nvidia GT216 pre-retina] > > Tested-by: William Brown <william@xxxxxxxxxxxxxxxx> > > [MBP 8,2 2011 intel SNB + amd turks pre-retina] > > Tested-by: Lukas Wunner <lukas@xxxxxxxxx> > > [MBP 9,1 2012 intel IVB + nvidia GK107 pre-retina] > > Tested-by: Bruno Bierbaumer <bruno@xxxxxxxxxxxxxx> > > [MBP 11,3 2013 intel HSW + nvidia GK107 retina] > > > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > Fixes: 60a5ca015ffd ("drm/i915: Add locking around > > framebuffer_references--") > > Reported-by: Lukas Wunner <lukas@xxxxxxxxx> > > [Lukas: Create v3 + v4 + v5 based on Tvrtko's v2] > > Signed-off-by: Lukas Wunner <lukas@xxxxxxxxx> > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > > Cc: Jani Nikula <jani.nikula@xxxxxxxxx> > > --- > > drivers/gpu/drm/i915/intel_fbdev.c | 20 ++++++++++++-------- > > 1 file changed, 12 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c > > index 96476d7..eee3306 100644 > > --- a/drivers/gpu/drm/i915/intel_fbdev.c > > +++ b/drivers/gpu/drm/i915/intel_fbdev.c > > @@ -119,7 +119,7 @@ static int intelfb_alloc(struct drm_fb_helper *helper, > > { > > struct intel_fbdev *ifbdev = > > container_of(helper, struct intel_fbdev, helper); > > - struct drm_framebuffer *fb; > > + struct drm_framebuffer *fb = NULL; > > struct drm_device *dev = helper->dev; > > struct drm_mode_fb_cmd2 mode_cmd = {}; > > struct drm_i915_gem_object *obj; > > @@ -137,6 +137,8 @@ static int intelfb_alloc(struct drm_fb_helper *helper, > > mode_cmd.pixel_format = drm_mode_legacy_fb_format(sizes->surface_bpp, > > sizes->surface_depth); > > > > + mutex_lock(&dev->struct_mutex); > > + > > size = mode_cmd.pitches[0] * mode_cmd.height; > > size = PAGE_ALIGN(size); > > obj = i915_gem_object_create_stolen(dev, size); > > @@ -158,18 +160,21 @@ static int intelfb_alloc(struct drm_fb_helper *helper, > > ret = intel_pin_and_fence_fb_obj(NULL, fb, NULL, NULL, NULL); > > if (ret) { > > DRM_ERROR("failed to pin obj: %d\n", ret); > > - goto out_fb; > > + goto out_unref; > > } > > > > + mutex_unlock(&dev->struct_mutex); > > + > > ifbdev->fb = to_intel_framebuffer(fb); > > > > return 0; > > > > -out_fb: > > - drm_framebuffer_remove(fb); > > out_unref: > > drm_gem_object_unreference(&obj->base); > > If fb init succeeded it took over the ref, no? So drm_framebuffer_remove() > will now attempt to unref one too many times. > > This taking over refs stuff is confusing. Maybe it would be better if > everyone just took an extra ref when they stash the obj pointer > somewhere, and everyone would then always release whatever ref they own > and no longer need. > > > out: > > + mutex_unlock(&dev->struct_mutex); > > + if (fb) > > + drm_framebuffer_remove(fb); > > return ret; > > } > > Hm, why do you think we unref one too many times? A bit further up in this function we call __intel_framebuffer_create() which sets the refcount to 1. (It calls intel_framebuffer_init(), which calls drm_framebuffer_init(), which calls kref_init(&fb->refcount).) So if intel_pin_and_fence_fb_obj() fails, we do need to unreference and tear down the fb. Thus, drm_framebuffer_remove() seems right here to me. However, because of your objection I've noticed now that "if (fb)" seems to be wrong, I think this should be "if (!IS_ERR_OR_NULL(fb))". Because if __intel_framebuffer_create() failed, fb will be a PTR_ERR(), so not null, and we'd call drm_framebuffer_remove() on this. Is that what you meant? > > @@ -187,8 +192,6 @@ static int intelfb_create(struct drm_fb_helper *helper, > > int size, ret; > > bool prealloc = false; > > > > - mutex_lock(&dev->struct_mutex); > > - > > if (intel_fb && > > (sizes->fb_width > intel_fb->base.width || > > sizes->fb_height > intel_fb->base.height)) { > > @@ -203,7 +206,7 @@ static int intelfb_create(struct drm_fb_helper *helper, > > DRM_DEBUG_KMS("no BIOS fb, allocating a new one\n"); > > ret = intelfb_alloc(helper, sizes); > > if (ret) > > - goto out_unlock; > > + return ret; > > intel_fb = ifbdev->fb; > > } else { > > DRM_DEBUG_KMS("re-using BIOS fb\n"); > > @@ -215,6 +218,8 @@ static int intelfb_create(struct drm_fb_helper *helper, > > obj = intel_fb->obj; > > size = obj->base.size; > > > > + mutex_lock(&dev->struct_mutex); > > + > > I'm thinking we won't even need the lock here anymore. But maybe I'm > missing something. > > > info = drm_fb_helper_alloc_fbi(helper); > > if (IS_ERR(info)) { > > ret = PTR_ERR(info); > > @@ -276,7 +281,6 @@ out_destroy_fbi: > > out_unpin: > > i915_gem_object_ggtt_unpin(obj); > > drm_gem_object_unreference(&obj->base); > > And this ref we don't own either AFAICS. Why? We did call intelfb_alloc() above, so if something subsequently goes wrong, we need to revert the steps that intelfb_alloc() carried out. The drm_gem_object_unreference() therefore seems right here to me. However I'm puzzled why we don't call drm_framebuffer_remove() under the out_unpin: label. Aren't we leaking a framebuffer here without that? Maybe you're referring to the fact that this function either inherits the BIOS fb or creates a new fb with intelfb_alloc(). I'm not sure if the cleanup on error is identical in these two cases. Maybe you meant that we don't own the ref in the case that the fb was inherited from BIOS? Best regards, Lukas > > > -out_unlock: > > mutex_unlock(&dev->struct_mutex); > > return ret; > > } > > -- > > 2.1.0 > > -- > Ville Syrjälä > Intel OTC _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx