Re: Looking for pointers on diagnosing ring test failure in amdgpu

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Matthew,

see inline below.

Am 14.06.2016 um 00:03 schrieb Matthew Macy:
  ---- On Mon, 13 Jun 2016 01:35:34 -0700 Christian König <christian.koenig@xxxxxxx> wrote ----
  > Hi Matthew,
  >
  > sounds like the UVD block doesn't want to initialize. No idea off hand
  > why, could be anything. I would need the hardware here for a closer
  > inspection.
  >
  > For a workaround you can try to disable the UVD blokc using the
  > ip_block_mask module parameter (it's a bitmask of enabled blocks e.g.
  > 0xffffffff means all blocks enabled, UVD is bit 7 on Carrizo IIRC).


When I clear bit 7 I get the following now:

Jun 14 07:58:18 trainwreck kernel: drmn0: fence driver on ring 10 use gpu addr 0x00000000400000b0, cpu addr 0x0xfffff800bd4320b0
Jun 14 07:58:18 trainwreck kernel: drmn0: fence driver on ring 11 use gpu addr 0x00000000400000c0, cpu addr 0x0xfffff800bd4320c0
Jun 14 07:58:19 trainwreck kernel: drmn0: SMU check loaded firmware failed, expecting 0x17f, getting 0x0[drm:0xffffffff826d4dc4s] *ERROR* amdgpu: smc start failed
Jun 14 07:58:19 trainwreck kernel: [drm:0xffffffff8269fc40s] *ERROR* hw_init 3 failed -22
Jun 14 07:58:19 trainwreck kernel: drmn0: amdgpu_init failed

UVD is optional (as long as you don't want to do hardware video decoding) but the SMU isn't. Alex, Rex any idea what's going wrong here?

Which is hard to correlate without spending a lot more quality time with the driver than I've had time for yet.

Yeah, I don't see why some blocks should fail while others seem to initialize just fine. Especially since you reported it seems to work on other hardware.

One thing that occurs to me is that Linux is usually compiled with gcc6 - has amdgpu ever been tested as compiled with clang?

Not as far as I know. We had some problems in the past even with some gcc versions because of some odd things in the BIOS headers (e.g. zero sized arrays). But those issues should be fixed by now.

Below is a list of the warnings I have to disable in order to get amdgpu to compile without disabling Werror altogether. The -Wno-format is an artifact of clang or FreeBSD treating long long and uint64_t as distinct types and the  -Wno-pointer-arith is to accept the linux convention of doing pointer arithmetic on void pointers. All the others are arguably oversights in the code (similar silencing has to be done in i915, but I've had better luck with it to date). I haven't fixed the warnings because I try to treat it as vendor code and minimize any local changes. Will you accept quasi-cosmetic patches from other operating systems / compilers?

Yeah, sure feel free to provide patches. As long as it is only cleanup and not structural changes it should be trivial to get them merged.

Especially "-Wno-missing-prototypes" and "-Wno-unused-variable" sound like something which should be trivial to fix.

Regards,
Christian.


Thanks.

-M


CWARNFLAGS+=    -Wno-pointer-arith
CWARNFLAGS+=    -Wno-pointer-sign ${CWARNFLAGS.${.IMPSRC:T}}

CWARNFLAGS.amdgpu_acpi.c=       -Wno-int-conversion -Wno-missing-prototypes -Wno-unused-variable
CWARNFLAGS.amdgpu_amdkfd.c=     -Wno-missing-prototypes
CWARNFLAGS.amdgpu_bo_list.c=    -Wno-missing-prototypes
CWARNFLAGS.amdgpu_cs.c= -Wno-missing-prototypes
CWARNFLAGS.amdgpu_device.c=     -Wno-format -Wno-cast-qual
CWARNFLAGS.amdgpu_fence.c=      -Wno-format
CWARNFLAGS.amdgpu_gfx.c=        -Wno-missing-prototypes
CWARNFLAGS.amdgpu_amdkfd_gfx_v7.c=      -Wno-cast-qual
CWARNFLAGS.amdgpu_amdkfd_gfx_v8.c=      -Wno-cast-qual
CWARNFLAGS.amdgpu_atpx_handler.c=       -Wno-missing-prototypes
CWARNFLAGS.amdgpu_ih.c= -Wno-cast-qual
CWARNFLAGS.amdgpu_ioc32.c=      -Wno-missing-prototypes
CWARNFLAGS.amdgpu_object.c=     -Wno-format
CWARNFLAGS.amdgpu_mn.c=         -Wno-unused-variable
CWARNFLAGS.amdgpu_pll.c=        -Wno-missing-prototypes
CWARNFLAGS.amdgpu_pm.c=         -Wno-missing-prototypes -Wno-enum-conversion
CWARNFLAGS.amdgpu_ring.c=       -Wno-cast-qual
CWARNFLAGS.amdgpu_ttm.c=        -Wno-missing-prototypes
CWARNFLAGS.amdgpu_ucode.c=      -Wno-incompatible-pointer-types-discards-qualifiers -Wno-cast-qual
CWARNFLAGS.amdgpu_uvd.c=        -Wno-format
CWARNFLAGS.amdgpu_vce.c=        -Wno-format
CWARNFLAGS.amdgpu_vce.c=        -Wno-format
CWARNFLAGS.amdgpu_vm.c=         -Wno-format
CWARNFLAGS.amdgpu_test.c=       -Wno-format
CWARNFLAGS.amdgpu_vm.c=         -Wno-format
CWARNFLAGS.atombios_crtc.c=     -Wno-missing-prototypes
CWARNFLAGS.atombios_dp.c=       -Wno-format
CWARNFLAGS.atombios_i2c.c=      -Wno-missing-prototypes
CWARNFLAGS.ci_dpm.c=    -Wno-unused-const-variable
CWARNFLAGS.cz_smc.c=    -Wno-missing-prototypes
CWARNFLAGS.fiji_smc.c=  -Wno-cast-qual
CWARNFLAGS.gfx_v7_0.c=  -Wno-missing-prototypes -Wno-cast-qual
CWARNFLAGS.gfx_v8_0.c=  -Wno-missing-prototypes
CWARNFLAGS.iceland_smc.c=       -Wno-missing-prototypes
CWARNFLAGS.kv_dpm.c=    -Wno-unused-const-variable
CWARNFLAGS.tonga_smc.c= -Wno-cast-qual
CWARNFLAGS.gpu_scheduler.c=     -Wno-format -Wno-missing-prototypes
CWARNFLAGS.amd_powerplay.c=     -Wno-missing-prototypes
CWARNFLAGS.eventtasks.c=        -Wno-missing-prototypes
CWARNFLAGS.cz_clockpowergating.c=       -Wno-missing-prototypes -Wno-enum-conversion
CWARNFLAGS.cz_hwmgr.c=  -Wno-missing-prototypes -Wno-cast-qual
CWARNFLAGS.fiji_hwmgr.c=        -Wno-missing-prototypes -Wno-cast-qual
CWARNFLAGS.fiji_thermal.c=      -Wno-missing-prototypes
CWARNFLAGS.pp_acpi.c=   -Wno-missing-prototypes
CWARNFLAGS.ppatomctrl.c=        -Wno-missing-prototypes -Wno-cast-qual
CWARNFLAGS.processpptables.c=   -Wno-missing-prototypes -Wno-sometimes-uninitialized
CWARNFLAGS.tonga_clockpowergating.c=    -Wno-missing-prototypes -Wno-enum-conversion
CWARNFLAGS.tonga_hwmgr.c=       -Wno-missing-prototypes -Wno-cast-qual
CWARNFLAGS.tonga_processpptables.c=     -Wno-missing-prototypes -Wno-cast-qual
CWARNFLAGS.tonga_thermal.c=     -Wno-missing-prototypes
CWARNFLAGS.tonga_smumgr.c=      -Wno-missing-prototypes -Wno-cast-qual
CWARNFLAGS.fiji_smumgr.c=       -Wno-missing-prototypes -Wno-cast-qual





  >
  > Regards,
  > Christian.
  >
  > Am 13.06.2016 um 03:35 schrieb Matthew Macy:
  > >
  > > I'm trying to bring up amdgpu an Carrizo A10 (Thinkpad e565 in case it matters) on FreeBSD. The driver is essentially unmodified from what is found in Linux 4.6 - relying on an extended version of FreeBSD's linuxkpi shims. The shims work well enough that i915/drm from 4.6 works extremely well on most hardware (I have yet to diagnose / fix the severe artifacts on Cherry Trail and Atom).
  > >
  > > On my A10 ring 11 test is failing:
  > >    https://gist.github.com/mattmacy/8e4a85072648eceb2445ad227dcc447c
  > >
  > > On my friend's A12 based EliteBook ring initialization succeeds:
  > > https://gist.github.com/mattmacy/d1fac64ab5190bb2568d6480dfbd7ee6
  > >
  > > With minor timing perturbations ring tests  will fail as early as ring 0.
  > >
  > > I'm hoping that one of the amdgpu developers might give me pointers on how to diagnose further and or what bugs in the linuxkpi might be causing this. I know that I can selectively disable the rings, but that doesn't help fix the underlying problem.
  > >
  > > Thanks in advance.
  > >
  > > -M
  > >
  > > _______________________________________________
  > > dri-devel mailing list
  > > dri-devel@xxxxxxxxxxxxxxxxxxxxx
  > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
  >
  > _______________________________________________
  > dri-devel mailing list
  > dri-devel@xxxxxxxxxxxxxxxxxxxxx
  > https://lists.freedesktop.org/mailman/listinfo/dri-devel
  >


_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux