Re: Issues with trying to boot falcons from sgt memory + Possible firmware SG_DEBUG fix?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]<

 



On 19/4/24 06:27, Lyude Paul wrote:

So - first some context here for Ben and anyone else who hasn't been
following. A little while ago I got a Slimbook Executive 16 with a
Nvidia RTX 4060 in it, and I've unfortunately been running into a kind
of annoying issue. Currently this laptop only has 16 gigs of ram, and
as it turns out - this can easily lead the system to having pretty
heavy memory fragmentation once it starts swapping pages out.

Normally this wouldn't matter, but I unfortunately discovered that when
we're runtime suspending the GPU in Nouveau - we actually appear to
allocate some of the memory we use for migrating using
dma_alloc_coherent. This starts to fail on my system once memory
fragmentation goes up like so:

   kworker/18:0: page allocation failure: order:7, mode:0xcc0(GFP_KERNEL),
   nodemask=(null),cpuset=/,mems_allowed=0
   CPU: 18 PID: 287012 Comm: kworker/18:0 Not tainted
   6.8.4-200.ChopperV1.fc39.x86_64 #1
   Hardware name: SLIMBOOK Executive/Executive, BIOS N.1.10GRU06 02/02/2024
   Workqueue: pm pm_runtime_work
   Call Trace:
    <TASK>
    dump_stack_lvl+0x47/0x60
    warn_alloc+0x165/0x1e0
    ? __alloc_pages_direct_compact+0x1ad/0x2b0
    __alloc_pages_slowpath.constprop.0+0xd7d/0xde0
    __alloc_pages+0x32d/0x350
    __dma_direct_alloc_pages.isra.0+0x16a/0x2b0
    dma_direct_alloc+0x70/0x280
    nvkm_gsp_radix3_sg+0x5e/0x130 [nouveau]
    r535_gsp_fini+0x1d4/0x350 [nouveau]
    nvkm_subdev_fini+0x67/0x150 [nouveau]
    nvkm_device_fini+0x95/0x1e0 [nouveau]
    nvkm_udevice_fini+0x53/0x70 [nouveau]
    nvkm_object_fini+0xb9/0x240 [nouveau]
    nvkm_object_fini+0x75/0x240 [nouveau]
    nouveau_do_suspend+0xf5/0x280 [nouveau]
    nouveau_pmops_runtime_suspend+0x3e/0xb0 [nouveau]
    pci_pm_runtime_suspend+0x67/0x1e0
    ? __pfx_pci_pm_runtime_suspend+0x10/0x10
    __rpm_callback+0x41/0x170
    ? __pfx_pci_pm_runtime_suspend+0x10/0x10
    rpm_callback+0x5d/0x70
    ? __pfx_pci_pm_runtime_suspend+0x10/0x10
    rpm_suspend+0x120/0x6a0
    pm_runtime_work+0x98/0xb0
    process_one_work+0x171/0x340
    worker_thread+0x27b/0x3a0
    ? __pfx_worker_thread+0x10/0x10
    kthread+0xe5/0x120
    ? __pfx_kthread+0x10/0x10
    ret_from_fork+0x31/0x50
    ? __pfx_kthread+0x10/0x10
    ret_from_fork_asm+0x1b/0x30

   nouveau 0000:01:00.0: gsp: suspend failed, -12
   nouveau: DRM-master:00000000:00000080: suspend failed with -12
   nouveau 0000:01:00.0: can't suspend (nouveau_pmops_runtime_suspend
   [nouveau] returned -12)

Keep in mind, I don't dive into memory management related stuff like
this very often! But I'd very much like to know how to help out
anywhere around the driver, including outside of my usual domains, so
I've been trying to write up a patch for this. The original suggestion
for a fix that Dave Airlie had given me was (unless I misunderstood,
which isn't unlikely) to try to see if we could get nvkm_gsp_mem_ctor()
to start allocating memory with vmalloc() and map that onto the GPU
using the SG helpers instead. So - I gave a shot at writing up a patch
for doing that:

https://gitlab.freedesktop.org/lyudess/linux/-/commit/b5a41ac2bd948979815d262d8d20b4f3333f9c26

As you can probably guess - the patch does not really seem to work, and
I've been trying to figure out why. There's already a couple of issues
I'm aware of: the most glaring one being that as Timur pointed out, a
lot of GSP hardware expects contiguous memory allocations - but
according to them the allocation that's specifically failing should be
small enough that it'd be allocated in a contiguous page anyway:

    [    9.429884] Lyude:r535_gsp_init:2186: (mbox1) == 0
    [    9.429898] Lyude:r535_gsp_init:2186: (mbox0) == dbdfe000
    [    9.491300] ------------[ cut here ]------------
    [    9.491308] WARNING: CPU: 5 PID: 921 at drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:1713 r535_gsp_init+0x75e/0x7c0 [nouveau]
    [    9.491533] Modules linked in: nouveau(+) rfkill binfmt_misc vfat fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep wmi_bmof ppdev snd_hda_core drm_ttm_helper intel_rapl_msr snd_seq ttm snd_seq_device snd_pcm video gpu_sched snd_timer i2c_algo_bit drm_gpuvm drm_exec intel_rapl_common mxm_wmi rapl snd drm_display_helper acpi_cpufreq soundcore k10temp i2c_piix4 parport_pc wmi parport gpio_amdpt gpio_generic loop dm_multipath nfnetlink zram crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 r8169 realtek sha1_ssse3 ccp w83627hf_wdt scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse
    [    9.491670] CPU: 5 PID: 921 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #22
    [    9.491681] Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
    [    9.491690] RIP: 0010:r535_gsp_init+0x75e/0x7c0 [nouveau]
    [    9.491885] Code: 8b 83 10 0d 00 00 48 89 ef 41 bf e4 ff ff ff 48 8b 40 18 48 8b 80 48 0f 00 00 48 8b 40 28 e8 b9 5e 89 ee 0f 0b e9 73 f9 ff ff <0f> 0b 41 bf fb ff ff ff e9 5a f9 ff ff 41 89 ef 0f 0b e9 5c f9 ff
    [    9.491905] RSP: 0018:ffffb271c175f748 EFLAGS: 00010246
    [    9.491914] RAX: 0000000000000000 RBX: ffffa098e192f000 RCX: ffffa098ca2768c8
    [    9.491922] RDX: ffffa098e191d400 RSI: ffffb271cc110080 RDI: ffffb271cc111388
    [    9.491930] RBP: 00000000dbdfe000 R08: 0000000000000003 R09: 0000000000000000
    [    9.491938] R10: 0000000000000000 R11: ffffa098ca276828 R12: ffffa098e192f008
    [    9.491946] R13: 000000022b906452 R14: ffffa098e192f008 R15: 0000000000000000
    [    9.491956] FS:  00007f4de98cc980(0000) GS:ffffa099c4a80000(0000) knlGS:0000000000000000
    [    9.491966] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [    9.491974] CR2: 00007f7bd8d18ea0 CR3: 0000000104e58000 CR4: 00000000003506f0
    [    9.491989] Call Trace:
    [    9.491996]  <TASK>
    [    9.492002]  ? __warn+0x80/0x120
    [    9.492012]  ? r535_gsp_init+0x75e/0x7c0 [nouveau]
    [    9.492200]  ? report_bug+0x164/0x190
    [    9.492211]  ? handle_bug+0x3c/0x80
    [    9.492218]  ? exc_invalid_op+0x17/0x70
    [    9.492227]  ? asm_exc_invalid_op+0x1a/0x20
    [    9.492241]  ? r535_gsp_init+0x75e/0x7c0 [nouveau]
    [    9.492429]  ? r535_gsp_init+0x18e/0x7c0 [nouveau]
    [    9.492616]  ? srso_return_thunk+0x5/0x5f
    [    9.492626]  nvkm_subdev_init_+0x48/0x130 [nouveau]
    [    9.492802]  ? srso_return_thunk+0x5/0x5f
    [    9.492810]  nvkm_subdev_init+0x44/0x90 [nouveau]
    [    9.492988]  nvkm_device_init+0x166/0x2e0 [nouveau]
    [    9.493189]  nvkm_udevice_init+0x47/0x70 [nouveau]
    [    9.493391]  nvkm_object_init+0x41/0x1c0 [nouveau]
    [    9.493567]  nvkm_ioctl_new+0x16a/0x290 [nouveau]
    [    9.493740]  ? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
    [    9.493912]  ? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
    [    9.494121]  nvkm_ioctl+0x10e/0x250 [nouveau]
    [    9.494288]  nvif_object_ctor+0x112/0x190 [nouveau]
    [    9.494456]  nvif_device_ctor+0x23/0x60 [nouveau]
    [    9.494625]  nouveau_cli_init+0x164/0x5d0 [nouveau]
    [    9.494820]  nouveau_drm_device_init+0x97/0xe00 [nouveau]
    [    9.495022]  ? srso_return_thunk+0x5/0x5f
    [    9.495030]  ? pci_bus_read_config_word+0x4d/0x90
    [    9.495039]  ? srso_return_thunk+0x5/0x5f
    [    9.495047]  ? pci_update_current_state+0x72/0xb0
    [    9.495059]  nouveau_drm_probe+0x12c/0x280 [nouveau]
    [    9.495245]  ? srso_return_thunk+0x5/0x5f
    [    9.495254]  local_pci_probe+0x45/0xa0
    [    9.495263]  pci_device_probe+0xc7/0x240
    [    9.495272]  really_probe+0xd6/0x390
    [    9.495282]  ? __pfx___driver_attach+0x10/0x10
    [    9.495290]  __driver_probe_device+0x78/0x150
    [    9.495301]  driver_probe_device+0x1f/0x90
    [    9.495308]  __driver_attach+0xd2/0x1c0
    [    9.495316]  bus_for_each_dev+0x88/0xd0
    [    9.495325]  bus_add_driver+0x116/0x220
    [    9.495334]  driver_register+0x59/0x100
    [    9.495342]  ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
    [    9.495512]  do_one_initcall+0x5b/0x320
    [    9.495524]  do_init_module+0x60/0x240
    [    9.495536]  init_module_from_file+0x86/0xc0
    [    9.495550]  idempotent_init_module+0x120/0x2b0
    [    9.495562]  __x64_sys_finit_module+0x5e/0xb0
    [    9.495571]  do_syscall_64+0x88/0x170
    [    9.495581]  ? srso_return_thunk+0x5/0x5f
    [    9.495589]  ? syscall_exit_to_user_mode_prepare+0x15d/0x190
    [    9.495600]  ? srso_return_thunk+0x5/0x5f
    [    9.495607]  ? syscall_exit_to_user_mode+0x60/0x210
    [    9.495615]  ? srso_return_thunk+0x5/0x5f
    [    9.495622]  ? do_syscall_64+0x95/0x170
    [    9.495630]  ? srso_return_thunk+0x5/0x5f
    [    9.495636]  ? do_syscall_64+0x95/0x170
    [    9.495644]  ? srso_return_thunk+0x5/0x5f
    [    9.495653]  entry_SYSCALL_64_after_hwframe+0x71/0x79
    [    9.495663] RIP: 0033:0x7f4de9b2919d
    [    9.495680] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4b cc 0c 00 f7 d8 64 89 01 48
    [    9.495697] RSP: 002b:00007ffc56bfe468 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
    [    9.495707] RAX: ffffffffffffffda RBX: 00005644a0432350 RCX: 00007f4de9b2919d
    [    9.495717] RDX: 0000000000000000 RSI: 00005644a042ef30 RDI: 0000000000000031
    [    9.495726] RBP: 00007ffc56bfe520 R08: 00007f4de9bf6b20 R09: 00007ffc56bfe4b0
    [    9.495734] R10: 00005644a04346a0 R11: 0000000000000246 R12: 00005644a042ef30
    [    9.495742] R13: 0000000000020000 R14: 00005644a0432d10 R15: 00005644a0434660
    [    9.495754]  </TASK>
    [    9.495759] ---[ end trace 0000000000000000 ]---
    [    9.495778] ------------[ cut here ]------------
    [    9.495784] WARNING: CPU: 5 PID: 921 at drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:2187 r535_gsp_init+0xc5/0x7c0 [nouveau]
    [    9.495981] Modules linked in: nouveau(+) rfkill binfmt_misc vfat fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep wmi_bmof ppdev snd_hda_core drm_ttm_helper intel_rapl_msr snd_seq ttm snd_seq_device snd_pcm video gpu_sched snd_timer i2c_algo_bit drm_gpuvm drm_exec intel_rapl_common mxm_wmi rapl snd drm_display_helper acpi_cpufreq soundcore k10temp i2c_piix4 parport_pc wmi parport gpio_amdpt gpio_generic loop dm_multipath nfnetlink zram crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 r8169 realtek sha1_ssse3 ccp w83627hf_wdt scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse
    [    9.496112] CPU: 5 PID: 921 Comm: (udev-worker) Tainted: G        W          6.9.0-rc3Lyude-Test+ #22
    [    9.496123] Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
    [    9.496132] RIP: 0010:r535_gsp_init+0xc5/0x7c0 [nouveau]
    [    9.496317] Code: 24 18 4c 8d 63 08 89 6c 24 14 4c 89 e6 6a 00 4c 8d 44 24 20 48 8d 4c 24 1c e8 b7 c3 fa ff 5f 41 89 c7 85 c0 0f 84 97 00 00 00 <0f> 0b 48 83 bb 20 0a 00 00 00 75 37 48 8b 44 24 20 65 48 2b 04 25
    [    9.496333] RSP: 0018:ffffb271c175f748 EFLAGS: 00010246
    [    9.496341] RAX: 0000000000000000 RBX: ffffa098e192f000 RCX: ffffa098ca2768c8
    [    9.496351] RDX: ffffa098e191d400 RSI: ffffb271cc110080 RDI: ffffb271cc111388
    [    9.496360] RBP: 00000000dbdfe000 R08: 0000000000000003 R09: 0000000000000000
    [    9.496368] R10: 0000000000000000 R11: ffffa098ca276828 R12: ffffa098e192f008
    [    9.496375] R13: 000000022b906452 R14: ffffa098e192f008 R15: 00000000fffffffb
    [    9.496383] FS:  00007f4de98cc980(0000) GS:ffffa099c4a80000(0000) knlGS:0000000000000000
    [    9.496393] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [    9.496400] CR2: 00007f7bd8d18ea0 CR3: 0000000104e58000 CR4: 00000000003506f0
    [    9.496410] Call Trace:
    [    9.496416]  <TASK>
    [    9.496422]  ? __warn+0x80/0x120
    [    9.496429]  ? r535_gsp_init+0xc5/0x7c0 [nouveau]
    [    9.496622]  ? report_bug+0x164/0x190
    [    9.496631]  ? handle_bug+0x3c/0x80
    [    9.496638]  ? exc_invalid_op+0x17/0x70
    [    9.496647]  ? asm_exc_invalid_op+0x1a/0x20
    [    9.496660]  ? r535_gsp_init+0xc5/0x7c0 [nouveau]
    [    9.496851]  ? r535_gsp_init+0x18e/0x7c0 [nouveau]
    [    9.497044]  ? srso_return_thunk+0x5/0x5f
    [    9.497055]  nvkm_subdev_init_+0x48/0x130 [nouveau]
    [    9.497227]  ? srso_return_thunk+0x5/0x5f
    [    9.497236]  nvkm_subdev_init+0x44/0x90 [nouveau]
    [    9.497405]  nvkm_device_init+0x166/0x2e0 [nouveau]
    [    9.497608]  nvkm_udevice_init+0x47/0x70 [nouveau]
    [    9.497808]  nvkm_object_init+0x41/0x1c0 [nouveau]
    [    9.497983]  nvkm_ioctl_new+0x16a/0x290 [nouveau]
    [    9.498154]  ? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
    [    9.498326]  ? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
    [    9.498531]  nvkm_ioctl+0x10e/0x250 [nouveau]
    [    9.498702]  nvif_object_ctor+0x112/0x190 [nouveau]
    [    9.498873]  nvif_device_ctor+0x23/0x60 [nouveau]
    [    9.499049]  nouveau_cli_init+0x164/0x5d0 [nouveau]
    [    9.499244]  nouveau_drm_device_init+0x97/0xe00 [nouveau]
    [    9.499430]  ? srso_return_thunk+0x5/0x5f
    [    9.499437]  ? pci_bus_read_config_word+0x4d/0x90
    [    9.499445]  ? srso_return_thunk+0x5/0x5f
    [    9.499452]  ? pci_update_current_state+0x72/0xb0
    [    9.499461]  nouveau_drm_probe+0x12c/0x280 [nouveau]
    [    9.499657]  ? srso_return_thunk+0x5/0x5f
    [    9.499666]  local_pci_probe+0x45/0xa0
    [    9.499674]  pci_device_probe+0xc7/0x240
    [    9.499683]  really_probe+0xd6/0x390
    [    9.499692]  ? __pfx___driver_attach+0x10/0x10
    [    9.499699]  __driver_probe_device+0x78/0x150
    [    9.499709]  driver_probe_device+0x1f/0x90
    [    9.499718]  __driver_attach+0xd2/0x1c0
    [    9.499726]  bus_for_each_dev+0x88/0xd0
    [    9.499735]  bus_add_driver+0x116/0x220
    [    9.499744]  driver_register+0x59/0x100
    [    9.499751]  ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
    [    9.499915]  do_one_initcall+0x5b/0x320
    [    9.499926]  do_init_module+0x60/0x240
    [    9.499934]  init_module_from_file+0x86/0xc0
    [    9.499948]  idempotent_init_module+0x120/0x2b0
    [    9.499962]  __x64_sys_finit_module+0x5e/0xb0
    [    9.499971]  do_syscall_64+0x88/0x170
    [    9.499987]  ? srso_return_thunk+0x5/0x5f
    [    9.499996]  ? syscall_exit_to_user_mode_prepare+0x15d/0x190
    [    9.500004]  ? srso_return_thunk+0x5/0x5f
    [    9.500011]  ? syscall_exit_to_user_mode+0x60/0x210
    [    9.500019]  ? srso_return_thunk+0x5/0x5f
    [    9.500026]  ? do_syscall_64+0x95/0x170
    [    9.500034]  ? srso_return_thunk+0x5/0x5f
    [    9.500041]  ? do_syscall_64+0x95/0x170
    [    9.500050]  ? srso_return_thunk+0x5/0x5f
    [    9.500058]  entry_SYSCALL_64_after_hwframe+0x71/0x79
    [    9.500067] RIP: 0033:0x7f4de9b2919d
    [    9.500075] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4b cc 0c 00 f7 d8 64 89 01 48
    [    9.500091] RSP: 002b:00007ffc56bfe468 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
    [    9.500103] RAX: ffffffffffffffda RBX: 00005644a0432350 RCX: 00007f4de9b2919d
    [    9.500112] RDX: 0000000000000000 RSI: 00005644a042ef30 RDI: 0000000000000031
    [    9.500121] RBP: 00007ffc56bfe520 R08: 00007f4de9bf6b20 R09: 00007ffc56bfe4b0
    [    9.500128] R10: 00005644a04346a0 R11: 0000000000000246 R12: 00005644a042ef30
    [    9.500136] R13: 0000000000020000 R14: 00005644a0432d10 R15: 00005644a0434660
    [    9.500149]  </TASK>
    [    9.500154] ---[ end trace 0000000000000000 ]---
    [    9.500162] nouveau 0000:1f:00.0: gsp: init failed, -5
    [    9.500189] nouveau 0000:1f:00.0: init failed with -5
    [    9.500196] nouveau: DRM-master:00000000:00000080: init failed with -5
    [    9.500207] nouveau 0000:1f:00.0: DRM-master: Device allocation failed: -5
    [    9.502661] nouveau 0000:1f:00.0: probe with driver nouveau failed with error -5


Which brings me to the second part - TImur had me enable CONFIG_SG_DEBUG, which quickly hit a different issue:

    [    8.992320] RIP: 0010:sg_init_one (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/scatterlist.h:187 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/lib/scatterlist.c:143 (discriminator 1))
    [ 8.992331] Code: 71 93 37 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54 24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b 94 7d 00 <0f> 0b 0f 0b 0f 0b 48 8b 05 5e ae 9f 01 eb b2 66 66 2e 0f 1f 84 00
    [    8.992428] Call Trace:
    [    8.992433]  <TASK>
    [    8.992439] ? die (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/dumpstack.c:421 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/dumpstack.c:434 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/dumpstack.c:447)
    [    8.992448] ? do_trap (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/traps.c:114 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/traps.c:155)
    [    8.992455] ? sg_init_one (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/scatterlist.h:187 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/lib/scatterlist.c:143 (discriminator 1))
    [    8.992464] ? do_error_trap (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./arch/x86/include/asm/traps.h:58 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/traps.c:176)
    [    8.992472] ? sg_init_one (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/scatterlist.h:187 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/lib/scatterlist.c:143 (discriminator 1))
    [    8.992481] ? exc_invalid_op (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/traps.c:267)
    [    8.992489] ? sg_init_one (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/scatterlist.h:187 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/lib/scatterlist.c:143 (discriminator 1))
    [    8.992496] ? asm_exc_invalid_op (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./arch/x86/include/asm/idtentry.h:621)
    [    8.992509] ? sg_init_one (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/scatterlist.h:187 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/lib/scatterlist.c:143 (discriminator 1))
    [    8.992518] nvkm_firmware_ctor (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/firmware.c:249) nouveau
    [    8.992722] nvkm_falcon_fw_ctor (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/falcon/fw.c:199) nouveau
    [    8.992898] ga102_gsp_booter_ctor (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/ga102.c:62) nouveau
    [    8.993095] r535_gsp_oneinit (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:2309) nouveau
    [    8.993292] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224)
    [    8.993302] ? kmem_cache_alloc_lru (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/mm/slub.c:3748 (discriminator 2) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/mm/slub.c:3827 (discriminator 2) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/mm/slub.c:3864 (discriminator 2))
    [    8.993311] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224)
    [    8.993317] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224)
    [    8.993324] ? ktime_get (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/time/timekeeping.c:292 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/time/timekeeping.c:388 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/time/timekeeping.c:848)
    [    8.993334] nvkm_subdev_oneinit_ (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/subdev.c:113) nouveau
    [    8.993510] nvkm_subdev_init_ (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/subdev.c:139) nouveau
    [    8.993685] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224)
    [    8.993693] nvkm_subdev_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/subdev.c:170) nouveau
    [    8.993867] nvkm_device_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c:3023) nouveau
    [    8.994079] nvkm_udevice_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/engine/device/user.c:295) nouveau
    [    8.994281] nvkm_object_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/object.c:245) nouveau
    [    8.994457] nvkm_ioctl_new (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/ioctl.c:149) nouveau
    [    8.994630] ? __pfx_nvkm_client_child_new (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/client.c:125) nouveau
    [    8.994803] ? __pfx_nvkm_udevice_new (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/engine/device/user.c:386) nouveau
    [    8.995013] nvkm_ioctl (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/ioctl.c:354 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/ioctl.c:376) nouveau
    [    8.995187] nvif_object_ctor (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvif/object.c:298 (discriminator 1)) nouveau
    [    8.995356] nvif_device_ctor (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvif/device.c:56) nouveau
    [    8.995524] nouveau_cli_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nouveau_drm.c:270) nouveau
    [    8.995721] nouveau_drm_device_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nouveau_drm.c:602) nouveau
    [    8.995915] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224)
    [    8.995923] ? pci_bus_read_config_word (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/access.c:67 (discriminator 1))
    [    8.995932] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224)
    [    8.995939] ? pci_update_current_state (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/pci.c:1195 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/pci.c:1187)
    [    8.995949] nouveau_drm_probe (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nouveau_drm.c:841) nouveau
    [    8.996145] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224)
    [    8.996154] local_pci_probe (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/pci-driver.c:325)
    [    8.996163] pci_device_probe (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/pci-driver.c:392 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/pci-driver.c:417 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/pci-driver.c:451 (discriminator 1))
    [    8.996174] really_probe (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/dd.c:578 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/dd.c:656)
    [    8.996185] ? __pfx___driver_attach (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/dd.c:1155)
    [    8.996192] __driver_probe_device (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/dd.c:798)
    [    8.996201] driver_probe_device (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/dd.c:828)
    [    8.996209] __driver_attach (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/dd.c:1215)
    [    8.996217] bus_for_each_dev (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/bus.c:368)
    [    8.996228] bus_add_driver (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/bus.c:673)
    [    8.996238] driver_register (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/driver.c:246)
    [    8.996246] ? __pfx_nouveau_drm_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvif/object.c:32) nouveau
    [    8.996415] do_one_initcall (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/init/main.c:1238)
    [    8.996428] do_init_module (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:2538)
    [    8.996437] init_module_from_file (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:3168)
    [    8.996450] idempotent_init_module (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/spinlock.h:351 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:3131 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:3185)
    [    8.996462] __x64_sys_finit_module (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/file.h:47 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:3207 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:3189 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:3189)
    [    8.996473] do_syscall_64 (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/entry/common.c:52 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/entry/common.c:83 (discriminator 1))
    [    8.996482] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224)
    [    8.996490] entry_SYSCALL_64_after_hwframe (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/entry/entry_64.S:129)
    [    8.996499] RIP: 0033:0x7fd12f52919d

I think timur actually mentioned this bug to you previously, but in
hopes of getting something more useful out of SG_DEBUG I dug into this
problem a  bit and ended up with what I believe is an actually correct
patch:

https://gitlab.freedesktop.org/lyudess/linux/-/commit/485f1fb62ddd4b42b60848eeb48206fef4376161

I think this patch is fine, and does solve the issue for me here if I enable SG_DEBUG.

Ben.


...unfortunately, fixing that issue on my system did not get SG_DEBUG
to give me any useful info.

Anyway - that brings me to ask 1: do you have any idea what might be
going on with the falcon boot issue I mentioned, or if I might just be
doing something wrong/silly with how I'm setting up memory in
nvkm_gsp_mem_ctor()?

And 2: if you have the time does that patch look correct? I'm happy to
submit it :)

Also 3: welcome back again :)




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux