Panic in drm_calc_timestamping_constants in staging-next

"ira.weiny" <ira.weiny@xxxxxxxxx> · Sun, 15 Nov 2015 13:17:00 -0500

With the latest staging-testing and staging-next[*] I am getting the following panic.

[*] git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git


[   11.232549] BUG: unable to handle kernel NULL pointer dereference at
00000000000000b0
[   11.232568] IP: [<ffffffffa0103206>]
drm_calc_timestamping_constants+0x86/0x130 [drm]
[   11.232571] PGD 0 
[   11.232574] Oops: 0002 [#1] SMP 
[   11.232595] Modules linked in: ib_qib mgag200(+) drm_kms_helper isci
syscopyarea sysfillrect sysimgblt fb_sys_fops ib_mad ttm libsas mlx4_core(+)
ib_core igb drm ahci scsi_transport_sas libahci ptp libata firewire_ohci
ib_addr pps_core firewire_core dca i2c_algo_bit i2c_core crc_itu_t
[   11.232600] CPU: 13 PID: 497 Comm: systemd-udevd Not tainted 4.3.0+ #1
[   11.232601] Hardware name: Intel Corporation W2600CR ........../W2600CR,
BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
[   11.232603] task: ffff8800343abfc0 ti: ffff8804244a8000 task.ti:
ffff8804244a8000
[   11.232618] RIP: 0010:[<ffffffffa0103206>]  [<ffffffffa0103206>]
drm_calc_timestamping_constants+0x86/0x130 [drm]
[   11.232620] RSP: 0018:ffff8804244ab118  EFLAGS: 00010246
[   11.232621] RAX: 0000000000fe4c00 RBX: ffff880424b10160 RCX:
0000000000000540
[   11.232623] RDX: 0000000000000000 RSI: 000000000000fde8 RDI:
ffff880424b10000
[   11.232624] RBP: ffff8804244ab148 R08: ffff8804244a8000 R09:
000000029d828339
[   11.232626] R10: 00000000000050c4 R11: 0000000000000000 R12:
0000000000fe4c00
[   11.232627] R13: ffff880424b10000 R14: 0000000000000000 R15:
000000000000fde8
[   11.232629] FS:  00007fecf960d880(0000) GS:ffff88082d940000(0000)
knlGS:0000000000000000
[   11.232631] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   11.232632] CR2: 00000000000000b0 CR3: 0000000424493000 CR4:
00000000000406e0
[   11.232634] Stack:
[   11.232637]  ffff8804244ab148 ffff880424b10000 ffff88042a86bb40
ffff88042a86b800
[   11.232639]  ffff88042a86bb48 ffff88042a86bb40 ffff8804244ab378
ffffffffa030c7e7
[   11.232642]  ffff880424b10090 0000000000000000 ffff880424b10160
0000000000000000
[   11.232642] Call Trace:
[   11.232655]  [<ffffffffa030c7e7>] drm_crtc_helper_set_mode+0x3d7/0x4b0
[drm_kms_helper]
[   11.232665]  [<ffffffffa030d7d4>] drm_crtc_helper_set_config+0x8d4/0xb10
[drm_kms_helper]
[   11.232683]  [<ffffffffa010c874>] drm_mode_set_config_internal+0x64/0x100
[drm]
[   11.232694]  [<ffffffffa0319352>] drm_fb_helper_pan_display+0xa2/0x280
[drm_kms_helper]
[   11.232703]  [<ffffffff81395a8b>] fb_pan_display+0xbb/0x170
[   11.232708]  [<ffffffff8138fd80>] bit_update_start+0x20/0x50
[   11.232712]  [<ffffffff8138e62b>] fbcon_switch+0x39b/0x590
[   11.232721]  [<ffffffff8140d260>] redraw_screen+0x1a0/0x240
[   11.232725]  [<ffffffff8140dc28>] vc_do_resize+0x4d8/0x500
[   11.232729]  [<ffffffff8140dc6f>] vc_resize+0x1f/0x30
[   11.232732]  [<ffffffff8138ec32>] fbcon_init+0x342/0x530
[   11.232737]  [<ffffffff8140b8ea>] visual_init+0xca/0x130
[   11.232741]  [<ffffffff8140dff6>] do_bind_con_driver+0x146/0x310
[   11.232746]  [<ffffffff8140e4e1>] do_take_over_console+0x141/0x1b0
[   11.232750]  [<ffffffff8138a187>] do_fbcon_takeover+0x57/0xb0
[   11.232754]  [<ffffffff8138f79b>] fbcon_event_notify+0x60b/0x750
[   11.232760]  [<ffffffff810a5889>] notifier_call_chain+0x49/0x70
[   11.232764]  [<ffffffff810a5bcd>] __blocking_notifier_call_chain+0x4d/0x70
[   11.232768]  [<ffffffff810a5c06>] blocking_notifier_call_chain+0x16/0x20
[   11.232772]  [<ffffffff8139563b>] fb_notifier_call_chain+0x1b/0x20
[   11.232775]  [<ffffffff81397691>] register_framebuffer+0x1f1/0x330
[   11.232784]  [<ffffffffa031a9ba>] drm_fb_helper_initial_config+0x27a/0x3d0
[drm_kms_helper]
[   11.232792]  [<ffffffffa0341b4d>] mgag200_fbdev_init+0xdd/0xf0 [mgag200]
[   11.232798]  [<ffffffffa0340586>] mgag200_modeset_init+0x176/0x1e0 [mgag200]
[   11.232804]  [<ffffffffa033c659>] mgag200_driver_load+0x3f9/0x580 [mgag200]
[   11.232819]  [<ffffffffa0106007>] drm_dev_register+0xa7/0xb0 [drm]
[   11.232834]  [<ffffffffa01084ef>] drm_get_pci_dev+0x8f/0x1e0 [drm]
[   11.232840]  [<ffffffffa034137b>] mga_pci_probe+0x9b/0xc0 [mgag200]
[   11.232848]  [<ffffffff813690f5>] local_pci_probe+0x45/0xa0
[   11.232853]  [<ffffffff8136a53c>] pci_device_probe+0xfc/0x140
[   11.232858]  [<ffffffff8145566b>] driver_probe_device+0x21b/0x460
[   11.232861]  [<ffffffff81455935>] __driver_attach+0x85/0x90
[   11.232864]  [<ffffffff814558b0>] ? driver_probe_device+0x460/0x460
[   11.232868]  [<ffffffff8145337c>] bus_for_each_dev+0x6c/0xc0
[   11.232871]  [<ffffffff81454fce>] driver_attach+0x1e/0x20
[   11.232873]  [<ffffffff81454ae0>] bus_add_driver+0x1d0/0x290
[   11.232876]  [<ffffffff814562e0>] driver_register+0x60/0xe0
[   11.232880]  [<ffffffff81368a9c>] __pci_register_driver+0x4c/0x50
[   11.232894]  [<ffffffffa0108720>] drm_pci_init+0xe0/0x110 [drm]
[   11.232897]  [<ffffffffa0348000>] ? 0xffffffffa0348000
[   11.232902]  [<ffffffffa0348032>] mgag200_init+0x32/0x1000 [mgag200]
[   11.232907]  [<ffffffff8100213d>] do_one_initcall+0xcd/0x1f0
[   11.232911]  [<ffffffff811c5e56>] ? __vunmap+0xa6/0xf0
[   11.232918]  [<ffffffff811e2c1b>] ? kmem_cache_alloc_trace+0x17b/0x1e0
[   11.232921]  [<ffffffff81185243>] ? do_init_module+0x27/0x1e8
[   11.232924]  [<ffffffff8118527c>] do_init_module+0x60/0x1e8
[   11.232930]  [<ffffffff8110a6e3>] load_module+0x12b3/0x1980
[   11.232933]  [<ffffffff81106b10>] ? __symbol_put+0x60/0x60
[   11.232938]  [<ffffffff81106f80>] ? copy_module_from_fd.isra.51+0x110/0x160
[   11.232943]  [<ffffffff8110afbf>] SyS_finit_module+0x9f/0xd0
[   11.232949]  [<ffffffff8169146e>] entry_SYSCALL_64_fastpath+0x12/0x71
[   11.232976] Code: f6 31 d2 41 89 c2 8b 83 b4 00 00 00 0f af c1 48 98 48 69
c0 40 42 0f 00 48 f7 f6 f6 43 74 10 41 89 c4 75 26 f6 05 fa 6f 03 00 01 <45> 89
96 b0 00 00 00 45 89 a6 ac 00 00 00 75 35 48 83 c4 08 5b 
[   11.232990] RIP  [<ffffffffa0103206>]
drm_calc_timestamping_constants+0x86/0x130 [drm]
[   11.232991]  RSP <ffff8804244ab118>
[   11.232992] CR2: 00000000000000b0
[   11.232996] ---[ end trace 402fdf8659b2f760 ]---
[   11.238445] Kernel panic - not syncing: Fatal exception
[   11.238510] Kernel Offset: disabled


I believe it is related to (but not directly caused by) this commit:


commit eba1f35dfe145247c7eb690c7c32740fde8ec699
Author: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
Date:   Mon Sep 14 22:43:43 2015 +0300

    drm: Move timestamping constants into drm_vblank_crtc
    
    Collect the timestamping constants alongside the rest of the relevant
    stuff under drm_vblank_crtc.
    
    We can now get rid of the 'refcrtc' parameter to
    drm_calc_vbltimestamp_from_scanoutpos().
    
    Signed-off-by: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
    Reviewed-by: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx>
    Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxx>



The reason I think it is not caused by the above commit is that when I run with
this commit I get a __hang__ rather than a panic.  But running with the parent
commit (below) works just fine:

commit 942840371cde152fe57c15e0e8483b760e7763e3
Author: Matt Roper <matthew.d.roper@xxxxxxxxx>
Date:   Mon Sep 21 17:21:48 2015 -0700

    drm/fbdev: Update legacy plane->fb refcounting for atomic restore
    
    Starting with commit
    
            commit 28cc504e8d52248962f5b485bdc65f539e3fe21d
            Author: Rob Clark <robdclark@xxxxxxxxx>
            Date:   Tue Aug 25 15:36:00 2015 -0400
    
                drm/i915: enable atomic fb-helper
    
    I've been seeing some panics on i915 when the DRM master shuts down that
appear
    to be caused by using an already-freed framebuffer (i.e., we're
unexpectedly
    dropping our initial FB's reference count to 0 and freeing it, which causes
a
    crash when we try to restore it later).  Digging deeper, the state FB
    refcounting is working as expected, but we seem to be missing proper
    refcounting on the legacy plane->fb pointers in the new atomic fbdev code.
    
    Tracking plane->old_fb and then doing a ref/unref at the end of the
    fbdev restore like we do in the legacy ioctl's ensures we don't miscount
    references on plane->fb and avoids the panics.
    
    v2 from Daniel:
    
    Really do what the atomic ioctl does:
    - Also update plane->fb and plane->crtc.
          - Clear out plane->old_fb on failures too.
                
    v3: git add everything. Oops.
    
    v4: Also clear old_fb in all other failure paths, spotted by David.
    
    Cc: Rob Clark <robdclark@xxxxxxxxx>
    Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx
    Cc: David Herrmann <dh.herrmann@xxxxxxxxx>
    Cc: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx>
    Signed-off-by: Matt Roper <matthew.d.roper@xxxxxxxxx> (v1)
    Reviewd-by: David Herrmann <dh.herrmann@xxxxxxxxx>
    Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxx>


Because I am getting a hang I'm not quite sure where to proceed with bisect
beyond the commit in question.

A bit of digging reveals that it may be that vblank has not been allocated at
all.  Using the following hack:

        struct drm_vblank_crtc *vblank = &crtc->dev->vblank[drm_crtc_index(crtc)];


12:46:25 > git di

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index eba6337f5860..649c32c00b36 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -641,8 +641,13 @@ void drm_calc_timestamping_constants(struct drm_crtc
*crtc,
                DRM_ERROR("crtc %u: Can't calculate constants, dotclock =
0!\n",
                          crtc->base.id);
 
-       vblank->linedur_ns  = linedur_ns;
-       vblank->framedur_ns = framedur_ns;
+       if ((u64)vblank < 1000) {
+               DRM_ERROR("crtc %u: Can't calculate linedur_ns or framedur_ns; vblank %p; drm_crtc_index(crtc) %d\n",
+                         crtc->base.id, vblank, drm_crtc_index(crtc));
+       } else {
+               vblank->linedur_ns  = linedur_ns;
+               vblank->framedur_ns = framedur_ns;
+       }
 
        DRM_DEBUG("crtc %u: hwmode: htotal %d, vtotal %d, vdisplay %d\n",
                  crtc->base.id, mode->crtc_htotal,


I got the following output.

kernel: [drm:drm_calc_timestamping_constants [drm]]
*ERROR* crtc 19: Can't calculate linedur_ns or framedur_ns; vblank (null); drm_crtc_index(crtc) 0

So this must mean that the vblank array is not allocated yet?

What intervening patch between 4.3 and the current staging-next might change
where/how vblank is allocated?

Thanks,
Ira

_______________________________________________
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxx
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel