Am 11.04.2017 um 11:45 schrieb Huang Rui: > On Tue, Apr 11, 2017 at 02:53:27PM +0800, Christian König wrote: >> Am 11.04.2017 um 04:58 schrieb Huang Rui: >>> This patch fixes the case when buffer funcs is empty and bo evict is >>> executing. It must double check buffer funcs, otherwise, a NULL >>> pointer dereference kernel panic will be encountered. >>> >>> BUG: unable to handle kernel NULL pointer dereference at 00000000000001a4 >>> IP: [<ffffffffa067b6cd>] amdgpu_evict_flags+0x3d/0xf0 [amdgpu] >>> PGD 0 >>> >>> Oops: 0000 [#1] SMP >>> Modules linked in: amdgpu(OE) ttm drm_kms_helper drm i2c_algo_bit >> fb_sys_fops syscopyarea sysfillrect sysimgblt fmem(OE) physmem_drv(OE) >> rpcsec_gss_krb5 nfsv4 nfs fscache intel_rapl x86_pkg_temp_thermal >> intel_powerclamp snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic >> kvm_intel snd_hda_intel snd_hda_codec kvm snd_hda_core joydev eeepc_wmi >> asus_wmi sparse_keymap snd_hwdep snd_pcm irqbypass crct10dif_pclmul >> snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq crc32_pclmul snd_seq_device >> ghash_clmulni_intel aesni_intel aes_x86_64 snd_timer lrw gf128mul mei_me snd >> glue_helper ablk_helper cryptd tpm_infineon mei lpc_ich serio_raw soundcore >> shpchp mac_hid nfsd auth_rpcgss nfs_acl lockd grace coretemp sunrpc parport_pc >> ppdev lp parport autofs4 hid_generic mxm_wmi r8169 usbhid ahci >>> psmouse libahci nvme mii hid nvme_core wmi video >>> CPU: 3 PID: 1627 Comm: kworker/u8:17 Tainted: G OE 4.9.0-custom >> #1 >>> Hardware name: ASUS All Series/Z87-A, BIOS 1802 01/28/2014 >>> Workqueue: events_unbound async_run_entry_fn >>> task: ffff88021e7057c0 task.stack: ffffc9000262c000 >>> RIP: 0010:[<ffffffffa067b6cd>] [<ffffffffa067b6cd>] >> amdgpu_evict_flags+0x3d/0xf0 [amdgpu] >>> RSP: 0018:ffffc9000262fb30 EFLAGS: 00010246 >>> RAX: 0000000000000000 RBX: ffff88021e8a5858 RCX: 0000000000000000 >>> RDX: 0000000000000001 RSI: ffffc9000262fb58 RDI: ffff88021e8a5800 >>> RBP: ffffc9000262fb48 R08: 0000000000000000 R09: ffff88021e8a5814 >>> R10: 000000001def8f01 R11: ffff88021def8c80 R12: ffffc9000262fb58 >>> R13: ffff88021d2b1990 R14: 0000000000000000 R15: ffff88021e8a5858 >>> FS: 0000000000000000(0000) GS:ffff88022ed80000(0000) >> knlGS:0000000000000000 >>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> CR2: 00000000000001a4 CR3: 0000000001c07000 CR4: 00000000001406e0 >>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> Can we have the full stack trace please? >> >> Eviction should never occur before the buffer funcs are initialized, so >> that patch just papers over some kind of race condition on startup as >> far as I can see. >> > If we set ip_block_mask=0xff, sdma ip won't enable it. So funcs_ring is > NULL at that time. Though it is a corner case, but we don't also expect it > hang with kernel panic. I met it when I was debugging S3. Good point, you should mention that in the commit message. With that fixed the patch is Reviewed-by: Christian König <christian.koenig at amd.com>. But I think there are a couple of more cases like this you might need to fix to work with the SDMA disabled. Regards, Christian. > Thanks, > Rui > _______________________________________________ > amd-gfx mailing list > amd-gfx at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx