Topaz doesn't support SRIOV. > -----Original Message----- > From: José Pekkarinen [mailto:jose.pekkarinen at canonical.com] > Sent: Tuesday, December 19, 2017 3:12 PM > To: Yu, Xiangliang <Xiangliang.Yu at amd.com> > Cc: amd-gfx at lists.freedesktop.org; Deucher, Alexander > <Alexander.Deucher at amd.com>; Koenig, Christian > <Christian.Koenig at amd.com> > Subject: Re: Topaz mistakenly reported as vf > > On Sunday, 17 December 2017 21:20:49 EET José Pekkarinen wrote: > > Hi, > > > > I hit an issue that seems to be a topaz discrete vga reporting it's a > > virtual function when my laptop is running on the battery. I received > > the following bactrace: > > > > Dec 17 11:17:28 bee kernel: [ 31.976810] kernel BUG at > > drivers/gpu/drm/amd/ amdgpu/mxgpu_vi.c:310! > > Dec 17 11:17:28 bee kernel: [ 31.976815] invalid opcode: 0000 [#1] SMP > > Dec 17 11:17:28 bee kernel: [ 31.976831] Modules linked in: vfio_pci > > vfio_virqfd udl loop bfq arc4 iwlmvm mac80211 kvmgt vfio_mdev > > amdgpu(+) mdev > > vfio_iommu_type1 vfio i915 uvcvideo x86_pkg_temp_thermal > > videobuf2_vmalloc videobuf2_memo ps videobuf2_v4l2 intel_powerclamp > > videobuf2_core coretemp videodev kvm_intel kvm i2c_algo_bit > > rtsx_pci_sdmmc drm_kms_helper joydev mmc_core media mousedev > > rtsx_pci_ms btusb btrtl btbcm memstick ttm drm wmi_bmof hci_uart > > btintel bluetoot h iwlwifi snd_hda_intel snd_hda_codec cfg80211 > > irqbypass crc32c_intel ghash_clmulni_intel intel_cstate snd_hwdep > > intel_uncore snd_hda_core psmouse intel_rapl_perf rtsx_pci snd_pcm > > efi_pstore evdev ideapad_laptop ac input_leds serio_raw e fivars > > sparse_keymap intel_lpss_acpi battery thermal ecdh_generic wmi fan > > syscopyarea snd_timer sysfillrect snd rfkill intel_lpss > > Dec 17 11:17:28 bee kernel: [ 31.977023] video sysimgblt tpm_crb > > soundcore button mfd_core i2c_hid i2c_i801 fb_sys_fops backlight > > acpi_pad efivarfs unix dm_zero dm_thin_pool dm_persistent_data > > dm_bio_prison dm_service_time dm_round_ro bin dm_queue_length > > dm_multipath dm_log_userspace cn dm_flakey dm_delay xts aesni_intel > > crypto_simd cryptd glue_helper aes_x86_64 cbc sha256_generic > > scsi_transport_iscsi r8169 mii fuse xfs nfs lockd grace sunrpc fscache > > ext4 mbcache jbd2 multipath linear raid10 raid1 raid0 dm_raid raid456 > > md_mod async_raid6_recov async_memcpy async_pq async_xor xor > async_tx > > raid6_pq libcrc32c dm_snapshot dm_bufio dm_crypt dm_mirror > > dm_region_hash dm_log dm_mod dax hid_generic usbhid xhc i_pci > xhci_hcd > > ohci_hcd uhci_hcd usb_storage ehci_pci ehci_hcd usbcore usb_common > > scsi_transport_fc sr_mod cdrom sg sd_mod ata_piix > > Dec 17 11:17:28 bee kernel: [ 31.977223] ahci libahci sata_sx4 > > pata_oldpiix Dec 17 11:17:28 bee kernel: [ 31.977239] CPU: 0 PID: 3698 > > Comm: udevd Not tainted 4.14.5 #10 > > Dec 17 11:17:28 bee kernel: [ 31.977255] Hardware name: LENOVO > 80UV/Lenovo > > ideapad 510S-14IKB, BIOS 2SCN21WW(V2.01) 12/20/2016 > > Dec 17 11:17:28 bee kernel: [ 31.977278] task: ffff880358b54280 > > task.stack: ffffc900014dc000 > > Dec 17 11:17:28 bee kernel: [ 31.977323] RIP: > > 0010:xgpu_vi_init_golden_registers+0x56/0xa0 [amdgpu] > > Dec 17 11:17:28 bee kernel: [ 31.977341] RSP: 0018:ffffc900014dfa08 > > EFLAGS: 00010293 > > Dec 17 11:17:28 bee kernel: [ 31.977356] RAX: 000000000000000a RBX: > > ffff880340040000 RCX: 0000000000000000 > > Dec 17 11:17:28 bee kernel: [ 31.977375] RDX: ffff880358b54280 RSI: > > 0000000000000100 RDI: ffff880340040000 > > Dec 17 11:17:28 bee kernel: [ 31.977394] RBP: ffffc900014dfa10 R08: > > ffff88033c6dd198 R09: 0000000000000000 > > Dec 17 11:17:28 bee kernel: [ 31.977413] R10: ffff880352c0aaa0 R11: > > 0000000000000008 R12: ffff880340040458 > > Dec 17 11:17:28 bee kernel: [ 31.977432] R13: 0000000000000000 R14: > > 0000000000000000 R15: ffff880340040000 > > Dec 17 11:17:28 bee kernel: [ 31.977452] FS: 00007fbfdd8c0780(0000) > > GS:ffff88046ec00000(0000) knlGS:0000000000000000 > > Dec 17 11:17:28 bee kernel: [ 31.977474] CS: 0010 DS: 0000 ES: 0000 CR0: > > 0000000080050033 > > Dec 17 11:17:28 bee kernel: [ 31.977490] CR2: 000055c3b48c1408 CR3: > > 0000000358307003 CR4: 00000000003606f0 > > Dec 17 11:17:28 bee kernel: [ 31.977527] Call Trace: > > Dec 17 11:17:28 bee kernel: [ 31.977555] vi_common_hw_init+0x77/0xe0 > > [amdgpu] > > Dec 17 11:17:28 bee kernel: [ 31.977584] > amdgpu_device_init+0xc4b/0x14b0 > > [amdgpu] > > Dec 17 11:17:28 bee kernel: [ 31.977601] ? kmem_cache_alloc_trace > > +0x208/0x250 > > Dec 17 11:17:28 bee kernel: [ 31.977629] ? > amdgpu_driver_load_kms+0x2a/ > > 0x1b0 [amdgpu] > > Dec 17 11:17:28 bee kernel: [ 31.977658] > > amdgpu_driver_load_kms+0x4f/0x1b0 [amdgpu] > > Dec 17 11:17:28 bee kernel: [ 31.977682] drm_dev_register+0x146/0x1d0 > > [drm] Dec 17 11:17:28 bee kernel: [ 31.977710] > > amdgpu_pci_probe+0x118/0x140 [amdgpu] > > Dec 17 11:17:28 bee kernel: [ 31.977725] pci_device_probe+0xcf/0x150 > > Dec 17 11:17:28 bee kernel: [ 31.977739] > driver_probe_device+0x29c/0x450 > > Dec 17 11:17:28 bee kernel: [ 31.977753] __driver_attach+0xdf/0xf0 > > Dec 17 11:17:28 bee kernel: [ 31.978775] ? > > driver_probe_device+0x450/0x450 Dec 17 11:17:28 bee kernel: [ 31.979815] > > bus_for_each_dev+0x60/0xa0 Dec 17 11:17:28 bee kernel: [ 31.980882] > > driver_attach+0x1e/0x20 Dec 17 11:17:28 bee kernel: [ 31.981931] > > bus_add_driver+0x170/0x260 Dec 17 11:17:28 bee kernel: [ 31.982977] > > driver_register+0x60/0xe0 Dec 17 11:17:28 bee kernel: [ 31.984033] > > __pci_register_driver+0x5a/0x60 Dec 17 11:17:28 bee kernel: [ 31.985089] > > amdgpu_init+0x88/0x9b [amdgpu] Dec 17 11:17:28 bee kernel: [ 31.986146] > > ? 0xffffffffa0c51000 > > Dec 17 11:17:28 bee kernel: [ 31.987192] do_one_initcall+0x52/0x190 > > Dec 17 11:17:28 bee kernel: [ 31.988229] ? kmem_cache_alloc_trace > > +0x208/0x250 > > Dec 17 11:17:28 bee kernel: [ 31.989270] ? do_init_module+0x27/0x202 > > Dec 17 11:17:28 bee kernel: [ 31.990308] ? do_init_module+0x27/0x202 > > Dec 17 11:17:28 bee kernel: [ 31.991383] do_init_module+0x5f/0x202 > > Dec 17 11:17:28 bee kernel: [ 31.992396] load_module+0x1511/0x1740 > > Dec 17 11:17:28 bee kernel: [ 31.993433] SyS_finit_module+0xc1/0x100 > > Dec 17 11:17:28 bee kernel: [ 31.994478] ? SyS_finit_module+0xc1/0x100 > > Dec 17 11:17:28 bee kernel: [ 31.995505] do_syscall_64+0x66/0x1a0 > > Dec 17 11:17:28 bee kernel: [ 31.996556] entry_SYSCALL64_slow_path > > +0x25/0x25 > > Dec 17 11:17:28 bee kernel: [ 31.997616] RIP: 0033:0x7fbfdcfd68f9 > > Dec 17 11:17:28 bee kernel: [ 31.998643] RSP: 002b:00007ffd31e4f848 > > EFLAGS: 00000246 ORIG_RAX: 0000000000000139 > > Dec 17 11:17:28 bee kernel: [ 31.999659] RAX: ffffffffffffffda RBX: > > 000055e4a76c8430 RCX: 00007fbfdcfd68f9 > > Dec 17 11:17:28 bee kernel: [ 32.000689] RDX: 0000000000000000 RSI: > > 00007fbfdd2a4565 RDI: 000000000000000e > > Dec 17 11:17:28 bee kernel: [ 32.001736] RBP: 00007fbfdd2a4565 R08: > > 0000000000000000 R09: 00007ffd31e4f9c0 > > Dec 17 11:17:28 bee kernel: [ 32.002813] R10: 000000000000000e R11: > > 0000000000000246 R12: 0000000000000000 > > Dec 17 11:17:28 bee kernel: [ 32.003862] R13: 000055e4a76d6710 R14: > > 0000000000020000 R15: 000055e4a741b8e9 > > Dec 17 11:17:28 bee kernel: [ 32.004906] Code: 48 89 df ba 4b 00 00 00 48 > > c7 c6 60 62 13 a1 e8 11 b7 fc ff 48 89 df ba 1e 00 00 00 48 c7 c6 e0 > > 61 13 > > a1 e8 fd b6 fc ff 5b 5d c3 <0f> 0b ba 05 01 00 00 48 c7 c6 c0 5d 13 a1 > > e8 > > e7 b6 fc ff 48 89 > > Dec 17 11:17:28 bee kernel: [ 32.006061] RIP: > > xgpu_vi_init_golden_registers +0x56/0xa0 [amdgpu] RSP: ffffc900014dfa08 > > Dec 17 11:17:28 bee kernel: [ 32.007226] ---[ end trace eb52a49a747a04be > > ]--- > > > > Which in the end means we got to the following BUG_ON on > > xgpu_vi_init_golden_registers: > > > > BUG_ON("Doesn't support chip type.\n"); > > > > Following the path in vi_init_golden_registers: > > > > if (amdgpu_sriov_vf(adev)) { > > xgpu_vi_init_golden_registers(adev); > > mutex_unlock(&adev->grbm_idx_mutex); > > return; > > } > > > > System is using the following kernel and cpu: > > > > $ uname -a > > Linux bee 4.14.5 #10 SMP Wed Dec 13 12:07:06 EET 2017 x86_64 Intel(R) > > Core(TM) i7-7500U CPU @ 2.70GHz GenuineIntel GNU/Linux > > > > And the graphic card is the following: > > > > # lspci -vvvs 01:00.0 > > 01:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] > > Topaz XT [Radeon R7 M260/M265 / M340/M360 / M440/M445] (rev 81) > > Subsystem: Lenovo Topaz XT [Radeon R7 M260/M265 / M340/M360 / > > M440/ M445] > > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > > ParErr- > > Stepping- SERR- FastB2B- DisINTx+ > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > > <TAbort- <MAbort- >SERR- <PERR- INTx- > > Latency: 0, Cache Line Size: 64 bytes > > Interrupt: pin A routed to IRQ 128 > > Region 0: Memory at a0000000 (64-bit, prefetchable) [size=256M] > > Region 2: Memory at b0000000 (64-bit, prefetchable) [size=2M] > > Region 4: I/O ports at 4000 [size=256] > > Region 5: Memory at b2300000 (32-bit, non-prefetchable) [size=256K] > > Expansion ROM at b2340000 [disabled] [size=128K] > > Capabilities: [48] Vendor Specific Information: Len=08 <?> > > Capabilities: [50] Power Management version 3 > > Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA > > PME(D0-,D1-,D2-,D3hot-,D3cold-) > > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- > > Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00 > > DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s > > <4us, > > L1 unlimited > > ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- > > DevCtl: Report errors: Correctable- Non-Fatal- Fatal- > > Unsupported- > > RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ > > MaxPayload 256 bytes, MaxReadReq 512 bytes > > DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- > > AuxPwr- > > TransPend- > > LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L0s L1, > > Exit Latency L0s <64ns, L1 <1us > > ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ > > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ > > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > > LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- > > SlotClk+ > > DLActive- BWMgmt- ABWMgmt- > > DevCap2: Completion Timeout: Not Supported, > > TimeoutDis-, LTR-, OBFF Not Supported > > DevCtl2: Completion Timeout: 50us to 50ms, > > TimeoutDis-, LTR-, OBFF Disabled > > LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- > > SpeedDis- Transmit Margin: Normal Operating Range, > > EnterModifiedCompliance- ComplianceSOS- > > Compliance De-emphasis: -6dB > > LnkSta2: Current De-emphasis Level: -3.5dB, > > EqualizationComplete+, EqualizationPhase1+ > > EqualizationPhase2+, EqualizationPhase3+, > > LinkEqualizationRequest- > > Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ > > Address: 00000000fee00338 Data: 0000 > > Capabilities: [100 v1] Vendor Specific Information: ID=0001 > > Rev=1 > > Len=010 <?> > > Capabilities: [150 v2] Advanced Error Reporting > > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- > > UnxCmplt- > > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- > > UnxCmplt- > > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- > > UnxCmplt- > > RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- > > NonFatalErr- > > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- > > NonFatalErr + > > AERCap: First Error Pointer: 00, GenCap+ CGenEn- > > ChkCap+ > > ChkEn- > > Capabilities: [270 v1] #19 > > Capabilities: [2b0 v1] Address Translation Service (ATS) > > ATSCap: Invalidate Queue Depth: 00 > > ATSCtl: Enable-, Smallest Translation Unit: 00 > > Capabilities: [2c0 v1] Page Request Interface (PRI) > > PRICtl: Enable- Reset- > > PRISta: RF- UPRGI- Stopped+ > > Page Request Capacity: 00000020, Page Request Allocation: > > 00000000 > > Capabilities: [2d0 v1] Process Address Space ID (PASID) > > PASIDCap: Exec+ Priv+, Max PASID Width: 10 > > PASIDCtl: Enable- Exec- Priv- > > Kernel driver in use: amdgpu > > Kernel modules: amdgpu > > > > Funny thing is that I can boot the machine properly when not running > > on the battery, so either this seems to be a problem in the firmware, > > or in the way acpi interacts with the driver. > > > > Any help, or ideas are appreciated. > > > > José. > > Adding Alex and Christian. > > Best regards. > > José >