Comment # 13
on bug 108781
from jamespharvey20@gmail.com
In all seriousness, can the AMD devs please tell me exactly which make and model video card the devs use? As long as it's something that has 3+ DisplayPorts, and can display 5 monitors using chaining, I'd honestly rather have to buy that and be done with all this, and sell mine on eBay saying "windows only". The symptom I see, and others are seeing, is that 4.18.16 boots to a tty just fine, and 4.19 goes to a black screen when I'd expect it to automatically use kms to go to a higher resolution. Bisecting between 4.18.16 and 4.19 unfortunately runs across multiple other amdgpu bugs that make this a tangled mess of spaghetti. Bisecting using "Do I get to see a tty on my monitor" as the deciding factor for good/bad absolutely gets to that 0d998891 is bad, and its parent c91b007e is good. I've confirmed via booting each of these a bunch of times. See new attached journalctl's line 3, which includes the auto kernel version confirming this. I really hope I'm wrong about this, but I don't think I've found the bug making my screen go black in 4.19. I'm saying this because the journacltl differences illustrating what's wrong with 0d998891 do not show up in 4.19. I think the 0d998891 bug was fixed by a later commit, and I think I haven't yet reached the bug I really care about in 4.19. The prospect of having to continue bisecting thousands of other commits with the multiple amdgpu bugs discussed below between these versions, plus who knows how many other bugs pop up and are fixed infuriates me. This isn't just about complaining about bisecting. It's about what in the world am I supposed to use as the deciding factor on "good" vs "bad"? So, more recent than 0d998891, the screen is going to be black a lot of the time, but I can't use that because I'm hunting for the "other black screen" bug. There are so many errors in 4.19 journalctl, I'd be comparing tons of journalctl's, since I couldn't go by is the screen on, going maybe based off the "amdgpu_device_ip_init failed". But, what if that isn't the deciding factor? I think all of this is why you were saying you don't think 0d998891 is the problem, because the 4.18.16 vs 4.19 original journalctl's I attached are showing a bug from somewhere else. With there being multiple bugs that pop up and back out, I honestly think AMD needs to revert all changes between 4.18.16 and 4.19, and only re-add them once it has actually tested the commits with its own products. Cards being discussed here are not unusual or old. I don't mind doing a bisect for an open source project once and a while, but I think having to get this deep is going too far, and with this being a company making code for its own product rather than something like a filesystem bug, I don't feel like this depth of bug hunting should be on me. If I'm wrong and 0d998891 is truly the source of the problem, and for some reason the 4.19 journalctl just don't show the errors at the bottom of this comment, then let me apologize and retract most of my rant here. But, with its journalctl errors disappearing somewhere between it and 4.19, I don't feel like I'm wrong. In my last comment, I was thinking it was at least possible I had the wrong commit at the very end, because I couldn't help but notice that the parent/good commit and the ones before it are regarding vkms. With the worst symptom being a black screen at the kms stage, it seemed to make sense that somehow vkms was somehow turning my system into a headless system, making the screen black. But, that's *NOT* what's happening. Parent/good commit has vkms=n. Although Arch 4.19 has vkms=m, I've been using Arch's 4.18 config which doesn't even have vkms, so it winds up using the default of =n. (Furthermore, I've tested Arch 4.19 as it is but changing vkms=n and I still get a black screen.) ----- Issue 1 We have to start somewhere, and the biggest issue to me right now is obviously the screen going black preventing a tty. Interestingly, using the 0d998891 (bad) commit, the system does boot and I can ssh in. Just all the screens are black. Like I explained above, I don't know if this turns out to be the cause of the 4.19 black screen. ----- Issue 2 [drm] Invalid PCC GPIO: 13! This error is a red herring as it pertains to the usable screen / black screen issue. It appears in both 0d998891 (bad) and its parent c91b007e (good.) So, that is in an earlier commit. No idea if it's harmful, but with it, at least booting c91b007e (good) to tty it works. So, another bisect towards older commits would be needed to find what causes this. ----- Issue 3 - Maybe an issue 4 or 5 in here too? [drm:dm_pp_get_static_clocks [amdgpu]] *ERROR* DM_PPLIB: invalid powerlevel state: 0! ... [drm:amdgpu_vce_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 12 test failed [drm:amdgpu_device_init.cold.14 [amdgpu]] *ERROR* hw_init of IP block <vce_v2_0> failed -110 amdgpu 0000:03:00.0: amdgpu_device_ip_init failed amdgpu 0000:03:00.0: Fatal error during GPU init (stacktrace) The rest of the errors in my original attachment, such as the ones briefly shown just above this paragraph, don't show in my good or bad commit. So, another bisect towards newer commits would be needed to find what causes these. Is this a single commit that introduces all of these errors? Could there be multiple commits causing all of this? Who knows. ----- Deeper on issue 1, regarding this bad commit I'm vimdiff'ing the new attached journalctl's with ":%s/Nov 21 ..:..:.. //g". These are interesting (to me) differences: archlinux kernel: Magic number: 10:966:801 archlinux kernel: acpi PNP0F03:00: hash matches ===good above becomes bad below - probably pseudo-random noise but not sure so including=== archlinux kernel: Magic number: 10:413:850 archlinux kernel: index2: hash matches (line repeats 32 times, number of cores I have) archlinux kernel: processor cpu14: hash matches Then at :1625(good) and :1663(bad) we see what changes between the good and bad commits, regarding drm/fbcon. [drm] amdgpu_dm_irq_schedule_work FAILED src 10 [drm] DM_MST: added connector: (____ptrval____) [id: 76] [master: (____ptrval____)] [drm] fb mappable at 0xC05BC000 [drm] vram apper at 0xC0000000 [drm] size 14745600 [drm] fb depth is 24 [drm] pitch is 10240 fbcon: amdgpudrmfb (fb0) is primary device switching from power state: ui class: performance internal class: none caps: uvd vclk: 0 dclk: 0 power level 0 sclk: 76600 mclk: 150000 pcie gen: 3 pcie lanes: 16 power level 1 sclk: 105000 mclk: 150000 pcie gen: 3 pcie lanes: 16 status: c switching to power state: ui class: performance internal class: none caps: uvd vclk: 0 dclk: 0 power level 0 sclk: 30000 mclk: 15000 pcie gen: 3 pcie lanes: 16 power level 1 sclk: 105000 mclk: 150000 pcie gen: 3 pcie lanes: 16 status: r [drm] dce_get_required_clocks_state: clocks unsupported disp_clk 681000 pix_clk 241500 ===good above becomes bad below=== [drm] amdgpu_dm_irq_schedule_work FAILED src 10 [drm] amdgpu_dm_irq_schedule_work FAILED src 8 [drm] amdgpu_dm_irq_schedule_work FAILED src 10 [drm] DM_MST: added connector: (____ptrval____) [id: 76] [master: (____ptrval____)] [drm] Cannot find any crtc or sizes [drm] amdgpu_dm_irq_schedule_work FAILED src 12 [drm] DM_MST: added connector: (____ptrval____) [id: 143] [master: (____ptrval____)] [drm] Cannot find any crtc or sizes [drm] DM_MST: added connector: (____ptrval____) [id: 220] [master: (____ptrval____)] [drm] Cannot find any crtc or sizes [drm] DM_MST: added connector: (____ptrval____) [id: 183] [master: (____ptrval____)] [drm] DM_MST: added connector: (____ptrval____) [id: 236] [master: (____ptrval____)] [drm] Cannot find any crtc or sizes [drm] DM_MST: added connector: (____ptrval____) [id: 266] [master: (____ptrval____)] [drm] Cannot find any crtc or sizes My original comment gave kernel parameters relating to radeon/amd. The journalctl's had it all. At first, I worried that abbreviating what I said in the comment might have thrown things off for the dev's, because the "bad" commit has to do with fb, and I do use some fbcon kernel parameters. But, trying my "bad" commit and even Arch 4.19 without the fbcon kernel parameters still leads to a black screen. It's in the journalctl's, but my full kernel line is: initrd=intel-ucode.img initrd=initramfs-linux.img root=/dev/lvm/arch rw consoleblank=0 fbcon=scrollback:128k fbcon=rotate:3 intel_iommu=on radeon.cik_support=0 amdgpu.cik_support=1 amdgpu.dpm=1 amdgpu.dc=1
You are receiving this mail because:
- You are the assignee for the bug.
_______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel