On Fri, Sep 13, 2024 at 01:00:24PM GMT, Matthew Auld wrote:
Spec says SW is expected to round up to the nearest 128K, if not already aligned for the CC unit view of CCS. We are seeing the assert sometimes pop on BMG to tell us that there is a hole between GSM and CCS, as well
may you paste the waning here? Just got a random BMG from the pile and have some may-be-related warnings showing up. And this patch didn't help: [ 1109.275389] ------------[ cut here ]------------ [ 1109.275392] xe 0000:03:00.0: [drm] Assertion `offset == (xe_mmio_read64_2x32(&_Generic(gt, const struct xe_gt * : (const struct xe_tile *)((gt)->tile), struct xe_gt * : (gt)->tile)->mmio, ((const struct xe_reg){ .addr = 0x108100, })) - ccs_size)` failed! platform: BATTLEMAGE subplatform: 1 graphics: Xe2_LPG / Xe2_HPG 20.01 step A0 media: Xe2_LPM / Xe2_HPM 13.01 step A1 Hole between CCS and GSM. [ 1109.275415] WARNING: CPU: 6 PID: 3377 at drivers/gpu/drm/xe/xe_vram.c:188 tile_vram_size+0x26d/0x500 [xe] [ 1109.275540] Modules linked in: xe(+) snd_hda_intel mei_gsc_proxy mei_gsc drm_gpuvm i2c_algo_bit drm_ttm_helper ttm gpu_sched drm_suballoc_helper drm_exec drm_display_helper drm_kunit_helpers drm_kms_helper kunit drm_buddy xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables br_netfilter bridge stp llc overla y sunrpc binfmt_misc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common snd_sof_pci_intel_tgl x86_pkg_temp_thermal snd_sof_pci_intel_cnl intel_powerclamp snd_sof_intel_hda_generic snd_sof_pci snd_sof_xtensa_dsp snd_sof_intel_hda_common coretemp snd_soc_hdac_hda cmdlinepart snd_sof_intel_hda snd_sof spi_nor kvm_intel snd_sof_utils mtd snd_soc_acpi_intel_match snd_soc_acpi kvm snd_intel_d spcfg snd_hda_codec snd_hwdep snd_sof_intel_hda_mlink rapl wmi_bmof snd_hda_ext_core intel_cstate snd_hda_core snd_soc_core snd_compress snd_pcm snd_timer nls_iso8859_1 snd i2c_i801 [ 1109.275604] spi_intel_pci idma64 soundcore i2c_smbus spi_intel mei_pxp mei_hdcp intel_pmc_core input_leds video intel_vsec joydev pmt_telemetry wmi pmt_class acpi_tad acpi_pad mac_hid mei_me mei sch_fq_codel msr drm efi_pstore dm_multipath nfnetlink ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 hid_generic usbhid hid crct10dif_pclmul crc32_pclmul poly val_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 r8169 realtek pinctrl_alderlake aesni_intel crypto_simd cryptd [last unloaded: xe] [ 1109.275651] CPU: 6 UID: 0 PID: 3377 Comm: xe_module_load Kdump: loaded Tainted: G W 6.11.0-rc7-xe+ #1 [ 1109.275654] Tainted: [W]=WARN [ 1109.275656] Hardware name: ASUS System Product Name/PRIME Z790-P WIFI, BIOS 0812 02/24/2023 [ 1109.275657] RIP: 0010:tile_vram_size+0x26d/0x500 [xe] [ 1109.275753] Code: 55 b0 41 52 8b 4d a8 51 8b 45 b8 48 c7 c1 a0 dc 1d a1 50 4c 8b 5d a0 41 53 44 8b 4d 9c 4c 8b 45 90 48 8b 55 88 e8 83 19 17 e0 <0f> 0b 48 83 c4 40 eb 11 49 8d 7d 20 be 00 81 10 00 e8 ed d0 fc ff [ 1109.275755] RSP: 0018:ffffc90001a0b418 EFLAGS: 00010282 [ 1109.275757] RAX: 0000000000000000 RBX: ffffc90001a0b538 RCX: 0000000000000027 [ 1109.275759] RDX: ffff88885f321a08 RSI: 0000000000000001 RDI: ffff88885f321a00 [ 1109.275760] RBP: ffffc90001a0b4e0 R08: 0000000000000000 R09: 0000000000000003 [ 1109.275761] R10: ffffc90001a0b270 R11: ffff88885ebfffe8 R12: 0000000000000001 [ 1109.275763] R13: 000000027bc40000 R14: ffffffffa1219868 R15: ffff888161660078 [ 1109.275764] FS: 00007f02bcb28c40(0000) GS:ffff88885f300000(0000) knlGS:0000000000000000 [ 1109.275766] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1109.275767] CR2: 00005606882c0f00 CR3: 00000001307ee000 CR4: 0000000000750ef0 [ 1109.275769] PKRU: 55555554 [ 1109.275770] Call Trace: [ 1109.275771] <TASK> [ 1109.275773] ? show_regs+0x64/0x70 [ 1109.275778] ? __warn+0x8e/0x1a0 [ 1109.275783] ? tile_vram_size+0x26d/0x500 [xe] [ 1109.275867] ? report_bug+0x171/0x1a0 [ 1109.275872] ? handle_bug+0x44/0x90 [ 1109.275876] ? exc_invalid_op+0x18/0x70 [ 1109.275879] ? asm_exc_invalid_op+0x1b/0x20 [ 1109.275886] ? tile_vram_size+0x26d/0x500 [xe] [ 1109.275965] ? tile_vram_size+0x26d/0x500 [xe] [ 1109.276046] xe_vram_probe+0xa1/0x860 [xe] Is this the one you're talking about? I don't really remember seeing this warning before. So maybe we let a regression in? Lucas De Marchi
as popping other asserts with having a vram size with strange alignment, which is likely caused by misaligned offset here. BSpec: 68023 Fixes: b5c2ca0372dc ("drm/xe/xe2hpg: Determine flat ccs offset for vram") Signed-off-by: Matthew Auld <matthew.auld@xxxxxxxxx> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@xxxxxxxxx> Cc: Akshata Jahagirdar <akshata.jahagirdar@xxxxxxxxx> Cc: Shuicheng Lin <shuicheng.lin@xxxxxxxxx> Cc: Matt Roper <matthew.d.roper@xxxxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> # v6.10+ --- drivers/gpu/drm/xe/xe_vram.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/xe/xe_vram.c b/drivers/gpu/drm/xe/xe_vram.c index 7e765b1499b1..8e65cb4cc477 100644 --- a/drivers/gpu/drm/xe/xe_vram.c +++ b/drivers/gpu/drm/xe/xe_vram.c @@ -181,6 +181,7 @@ static inline u64 get_flat_ccs_offset(struct xe_gt *gt, u64 tile_size) offset = offset_hi << 32; /* HW view bits 39:32 */ offset |= offset_lo << 6; /* HW view bits 31:6 */ + offset = round_up(offset, SZ_128K); /* SW must round up to nearest 128K */ offset *= num_enabled; /* convert to SW view */ /* We don't expect any holes */ -- 2.46.0