On Thu, Mar 17, 2011 at 01:46:34PM +0000, Chris Wilson wrote: > On Thu, 17 Mar 2011 09:39:07 -0300, Herton Ronaldo Krzesinski <herton.krzesinski@xxxxxxxxxxxxx> wrote: > > I don't know if it's the most correct fix, but perhaps the simple fix > > is needed in the code. It's against latest Linus tree. We may have an > > already removed client_list, or we didn't add any item to client_list > > (file_priv NULL in i915_add_request) > > > > Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@xxxxxxxxxxxxx> > > --- > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > index 36e66cc..6077c0d 100644 > > --- a/drivers/gpu/drm/i915/i915_gem.c > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > @@ -1749,8 +1749,10 @@ i915_gem_request_remove_from_client(struct drm_i915_gem_request *request) > > return; > > > > spin_lock(&file_priv->mm.lock); > > - list_del(&request->client_list); > > - request->file_priv = NULL; > > + if (request->file_priv) { > > + list_del(&request->client_list); > > + request->file_priv = NULL; > > + } > > spin_unlock(&file_priv->mm.lock); > > } > > This is the single chunk required. I had thought that the actual > insertion/deletion was serialised under the struct mutex and the intention > of the spinlock was to protect the unlocked list traversal during > throttling. However, I missed that i915_gem_release() is also called > without struct mutex and so we do need the double check for > i915_gem_request_remove_from_client(). Ok. I just still have one doubt though, if in i915_add_request file/file_priv is NULL, wouldn't be possible to have an oops also in i915_gem_release without the check? As in this case, request->client_list wouldn't have mm.request_list added to it, and if an error occurs and i915_reset is called, which ends up calling i915_gem_release, we would try to do a list_del on request->client_list without items. If the check really isn't needed in i915_gem_release, then please consider this patch: From: Herton Ronaldo Krzesinski <herton.krzesinski@xxxxxxxxxxxxx> Avoid oops in i915_gem_request_remove_from_client client_list removal When i915_gem_retire_requests_ring calls i915_gem_request_remove_from_client, the client_list for that request may already be removed in i915_gem_release. So we may call twice list_del(&request->client_list), resulting in an oops like this report: [126167.230394] BUG: unable to handle kernel paging request at 00100104 [126167.230699] IP: [<f8c2ce44>] i915_gem_retire_requests_ring+0xd4/0x240 [i915] [126167.231042] *pdpt = 00000000314c1001 *pde = 0000000000000000 [126167.231314] Oops: 0002 [#1] SMP [126167.231471] last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0C0A:00/power_supply/BAT1/current_now [126167.231901] Modules linked in: snd_seq_dummy nls_utf8 isofs btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs exportfs reiserfs cryptd aes_i586 aes_generic binfmt_misc vboxnetadp vboxnetflt vboxdrv parport_pc ppdev snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel snd_hda_codec snd_hwdep arc4 snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq uvcvideo videodev snd_timer snd_seq_device joydev iwlagn iwlcore mac80211 snd cfg80211 soundcore i915 drm_kms_helper snd_page_alloc psmouse drm serio_raw i2c_algo_bit video lp parport usbhid hid sky2 sdhci_pci ahci sdhci libahci [126167.232018] [126167.232018] Pid: 1101, comm: Xorg Not tainted 2.6.38-6-generic-pae #34-Ubuntu Gateway MC7833U / [126167.232018] EIP: 0060:[<f8c2ce44>] EFLAGS: 00213246 CPU: 0 [126167.232018] EIP is at i915_gem_retire_requests_ring+0xd4/0x240 [i915] [126167.232018] EAX: 00200200 EBX: f1ac25b0 ECX: 00000040 EDX: 00100100 [126167.232018] ESI: f1a2801c EDI: e87fc060 EBP: ef4d7dd8 ESP: ef4d7db0 [126167.232018] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [126167.232018] Process Xorg (pid: 1101, ti=ef4d6000 task=f1ba6500 task.ti=ef4d6000) [126167.232018] Stack: [126167.232018] f1a28000 f1a2809c f1a28094 0058bd97 f1aa2400 f1a2801c 0058bd7b 0058bd85 [126167.232018] f1a2801c f1a28000 ef4d7e38 f8c2e995 ef4d7e30 ef4d7e60 c14d1ebc f6b3a040 [126167.232018] f1522cc0 000000db 00000000 f1ba6500 ffffffa1 00000000 00000001 f1a29214 [126167.232018] Call Trace: Unfortunately the call trace reported was cut, but looking at debug symbols the crash is at __list_del, when probably list_del is called twice on the same request->client_list, as the dereferenced value is LIST_POISON1 + 4, and by looking more at the debug symbols before list_del call it should have being called by i915_gem_request_remove_from_client And as I can see in the code, it seems we indeed have the possibility to remove a request->client_list twice, which would cause the above, because we do list_del(&request->client_list) on both i915_gem_request_remove_from_client and i915_gem_release As Chris Wilson pointed out, it's indeed the case: "(...) I had thought that the actual insertion/deletion was serialised under the struct mutex and the intention of the spinlock was to protect the unlocked list traversal during throttling. However, I missed that i915_gem_release() is also called without struct mutex and so we do need the double check for i915_gem_request_remove_from_client()." This change does the required check to see if we really need to remove request->client_list. BugLink: http://bugs.launchpad.net/bugs/733780 Cc: stable@xxxxxxxxxx Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@xxxxxxxxxxxxx> --- drivers/gpu/drm/i915/i915_gem.c | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 950a5ab..fdf9166 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1753,8 +1753,10 @@ i915_gem_request_remove_from_client(struct drm_i915_gem_request *request) return; spin_lock(&file_priv->mm.lock); - list_del(&request->client_list); - request->file_priv = NULL; + if (request->file_priv) { + list_del(&request->client_list); + request->file_priv = NULL; + } spin_unlock(&file_priv->mm.lock); } -- 1.7.1 > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre > -- []'s Herton _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel