Re: xserver crash with linux 4.6.0-rc3 and later

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 29, 2016 at 01:25:30PM -0400, John S Gruber wrote:
> Starting with linux 4.6.0-rc3 my Ubuntu Wily system no longer allows logons from
> due to an immediate abort in xserver after just after entering my
> userid and password. (lightdm drew the sign on screen OK).
> 
> The xserver problem seems to result from a null reference from
>  __kgem_retire_rq from package xserver-xorg-video-intel version
> 2:2.99.917+git20150808-0ubuntu4.
> 
> Bisecting the kernel I found that this was triggered by commit
> 426960bed3217f72a1b7bb94f084d79cc616ec0f. Reverting this commit based on
> 4.6-rc5 eliminated my crash.
> 
> The problem was specific to my HP Pavilion laptop with Intel HD 5500
> integrated graphics . A desktop Acer, also using Intel graphics, was
> fine. On the laptop it was completely consistent.
> 
> The laptop has:
> 
> 00:02.0 VGA compatible controller: Intel Corporation Broadwell-U
> Integrated Graphics (rev 09) (prog-if 00 [VGA controller])
>     DeviceName: Intel(R) Graphics GT2
> 
> Testing the laptop with Ubuntu xenial (with xserver-xorg-video-intel
> version 2:2.99.917+git20160325-1ubuntu1) was fine, however.
> 
> Please let me know if this is problematic, and if so, if I should provide
> additional information. I don't follow the list.
> 
> ----------------------
> 
> The triggering commit:
> 
> drm/i915: Seal busy-ioctl uABI and prevent leaking of internal ids

The seeds of that crash were already sown. The error is that on a batch
buffer allocation failure, the preallocated failsafe ended up on the
request list (which is not supposed to happen and so it runs off the end
of the list).

commit 69d8edc11173df021aa2e158b2530257113141fd
Author: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Date:   Fri Aug 7 10:08:17 2015 +0100

    sna: Handle batch allocation failure
    
    Whilst we currently do not try and submit a failed batch buffer
    allocation, we still treat it as a valid request. This explodes much
    later when we inspect the NULL rq->bo.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=91577
    Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>

is the cause of the crash, but

commit 2d26643cab33a32847afaf13b50d326d09d58bf7
Author: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Date:   Fri Nov 13 19:03:36 2015 +0000

    sna/dri2: Drop the reference on the fence when complete
    
    Fixes regression from
    
    commit 8d9e496670f48b4eec64dfe1bcedb49793cf3073
    Author: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
    Date:   Wed Jul 22 11:14:01 2015 +0100
    
        sna/dri2: Take over the placeholder vblank
    
    After noting the fence was complete, we would clear it. But I forgot
    that we actually held a reference on to it, and so we would leak the 64k
    batch, and starve the system of available memory in about 18 minutes of
    SwapBuffers.
    
    Reported-by: Arkadiusz Miskiewicz <arekm@xxxxxxxx>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92911
    Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>

is where the bug began. The kernel just made it easier to hit the
pre-existing bugs in userspace.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux