Re: TTM refcount problem.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 16.10.19 um 12:09 schrieb Bas Nieuwenhuizen:
On Mon, Jul 29, 2019 at 11:32 AM Christian König
<ckoenig.leichtzumerken@xxxxxxxxx> wrote:
Is this a known issue?
No, that looks like a new one to me.

Is that somehow reproducible?
I tried finding a reliable reproducer (only Vulkan CTS runs uncommonly
caught it), but could not find anything better.

However this issue seems to be fixed with one of the following patches
from drm-misc-fixes:

"drm/ttm: fix handling in ttm_bo_add_mem_to_lru"
"drm/ttm: fix busy reference in ttm_mem_evict_first"

I haven't seen the issue in 100 CTS runs.

Thanks for the information.

I'm currently completely reworking the handling and trying to get rid of all the reference dropping which just results in a BUG().

Issues like that one will then hopefully completely disappear.

Regards,
Christian.


Thanks,
Bas

Christian.

Am 29.07.19 um 10:14 schrieb Bas Nieuwenhuizen:
Hi all,

I have a TTM refcount issue:

[173774.309968] ------------[ cut here ]------------
[173774.309970] kernel BUG at drivers/gpu/drm/ttm/ttm_bo.c:202!
[173774.309982] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[173774.309985] CPU: 13 PID: 128214 Comm: kworker/13:2 Not tainted
5.2.0-rc1-g3f2e519b0974 #10
[173774.309986] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./X399 Taichi, BIOS P1.50 09/05/2017
[173774.309995] Workqueue: events ttm_bo_delayed_workqueue [ttm]
[173774.310000] RIP: 0010:ttm_bo_ref_bug+0x5/0x10 [ttm]
[173774.310002] Code: c0 c3 b8 01 00 00 00 c3 66 66 2e 0f 1f 84 00 00
00 00 00 66 90 0f 1f 44 00 00 f0 ff 8f a4 00 00 00 c3 0f 1f 00 0f 1f
44 00 00 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 8b 07
48 89
[173774.310003] RSP: 0018:ffffb42e5589bde8 EFLAGS: 00010246
[173774.310005] RAX: ffffb42e5589be40 RBX: ffff9395fd0cd908 RCX:
ffff9395fd0cd8f8
[173774.310006] RDX: ffffb42e5589be40 RSI: ffff939b59b64f18 RDI:
ffff9395fd0cd87c
[173774.310007] RBP: ffffffffc0930f40 R08: 0000000000140000 R09:
ffffffffc091f100
[173774.310008] R10: ffff9399f69b0800 R11: 0000000000000001 R12:
0000000000000000
[173774.310009] R13: ffff9395fd0cd850 R14: 0000000000000001 R15:
0000000000000001
[173774.310010] FS:  0000000000000000(0000) GS:ffff939b7d340000(0000)
knlGS:0000000000000000
[173774.310011] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[173774.310012] CR2: 00007f4f64008838 CR3: 0000000643baa000 CR4:
00000000003406e0
[173774.310013] Call Trace:
[173774.310019]  ttm_bo_cleanup_refs+0x160/0x1e0 [ttm]
[173774.310025]  ttm_bo_delayed_delete+0xa8/0x1e0 [ttm]
[173774.310029]  ttm_bo_delayed_workqueue+0x17/0x40 [ttm]
[173774.310033]  process_one_work+0x1fd/0x430
[173774.310036]  worker_thread+0x2d/0x3d0
[173774.310038]  ? process_one_work+0x430/0x430
[173774.310040]  kthread+0x112/0x130
[173774.310042]  ? kthread_create_on_node+0x60/0x60
[173774.310045]  ret_from_fork+0x22/0x40
[173774.310048] Modules linked in: fuse nct6775 hwmon_vid
nls_iso8859_1 nls_cp437 vfat fat edac_mce_amd kvm_amd kvm irqbypass
amdgpu arc4 iwlmvm mac80211 snd_usb_audio uvcvideo snd_usbmidi_lib
videobuf2_vmalloc crct10dif_pclmul videobuf2_memops
snd_hda_codec_realtek videobuf2_v4l2 btusb gpu_sched snd_rawmidi
videobuf2_common snd_hda_codec_generic btrtl videodev crc32_pclmul
btbcm snd_seq_device ledtrig_audio ttm btintel ghash_clmulni_intel
wmi_bmof mxm_wmi snd_hda_codec_hdmi media bluetooth drm_kms_helper
iwlwifi snd_hda_intel drm aesni_intel snd_hda_codec joydev input_leds
aes_x86_64 snd_hda_core mousedev evdev crypto_simd cryptd ecdh_generic
led_class agpgart snd_hwdep mac_hid cdc_acm glue_helper ecc snd_pcm
igb syscopyarea pcspkr cfg80211 sysfillrect snd_timer sysimgblt snd
fb_sys_fops ccp ptp soundcore pps_core rng_core k10temp i2c_algo_bit
sp5100_tco dca i2c_piix4 rfkill wmi pcc_cpufreq button acpi_cpufreq
sch_fq_codel ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2
sd_mod
[173774.310085]  hid_generic usbhid hid crc32c_intel ahci xhci_pci
libahci xhci_hcd libata usbcore scsi_mod usb_common
[173774.310094] ---[ end trace 1f8d21980c0b3fd5 ]---
[173774.310097] RIP: 0010:ttm_bo_ref_bug+0x5/0x10 [ttm]
[173774.310099] Code: c0 c3 b8 01 00 00 00 c3 66 66 2e 0f 1f 84 00 00
00 00 00 66 90 0f 1f 44 00 00 f0 ff 8f a4 00 00 00 c3 0f 1f 00 0f 1f
44 00 00 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 8b 07
48 89
[173774.310100] RSP: 0018:ffffb42e5589bde8 EFLAGS: 00010246
[173774.310101] RAX: ffffb42e5589be40 RBX: ffff9395fd0cd908 RCX:
ffff9395fd0cd8f8
[173774.310102] RDX: ffffb42e5589be40 RSI: ffff939b59b64f18 RDI:
ffff9395fd0cd87c
[173774.310103] RBP: ffffffffc0930f40 R08: 0000000000140000 R09:
ffffffffc091f100
[173774.310104] R10: ffff9399f69b0800 R11: 0000000000000001 R12:
0000000000000000
[173774.310104] R13: ffff9395fd0cd850 R14: 0000000000000001 R15:
0000000000000001
[173774.310106] FS:  0000000000000000(0000) GS:ffff939b7d340000(0000)
knlGS:0000000000000000
[173774.310107] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[173774.310107] CR2: 00007f4f64008838 CR3: 0000000643baa000 CR4:
00000000003406e0
[173774.310110] note: kworker/13:2[128214] exited with preempt_count 1


With amd-staging-drm-next:

commit 20d6b9c3b7f40ec427af912d140f2be0de098d2d (origin/amd-staging-drm-next)
Author: Gustavo A. R. Silva <gustavo@xxxxxxxxxxxxxx>
Date:   Mon Jul 22 12:47:16 2019 -0500

      drm/amdkfd/kfd_mqd_manager_v10: Avoid fall-through warning

with a Vega10.

Is this a known issue?

Thanks,
Bas
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux