What I want to know is what is calling your machine ‘localhorst’? Sent from my iPhone > On Nov 20, 2018, at 9:15 AM, bugzilla-daemon@xxxxxxxxxxxxxxx wrote: > > Comment # 47 on bug 105733 from Allan > I have really bad news. > > I'm delaying a lot to answer because I literally sent for warranty or replaced > ALL of my components in the PC. > > The CPU (R7 1800X) was replaced from a batch 21 to a new by AMD itself batched > 35. > > But OK, let's talk about the amdgpu : > > (In reply to Andrey Grodzovsky from comment #25) > > (In reply to Allan from comment #12) > > Can you build latest kernel (4.18) and grab again latest firmware and try > > again ? > > Links to kernel and firmware: > > https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next > > https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/ > > For reasons already explained here I couldn't either compile or test it before, > so please don't be mad with me : > - Sold my old PC. > - My notebook was completely filled with files. > - Components on warranty. Testing everything else. > > So I managed to borrow a PC to test the video cards. I have tested only the > nvidia one to prove for AMD that the GPU is working and the pci-controller (a > guess of mine) of the CPU/chipset that is broken. Going to test the RX480 on > this PC as soon as possible. My warranties are expiring and I had to enumerate > priorities. > > I already said it here but, with the 1800X I couldn't even clone the git > repository (the checksum always fails, tried many times). > > Then I managed to free some space on my notebook and started to build > yesterday. > - Included amd-ucode firmware. > - Included polaris10 firmware (for RX480). > - Made some optimizations for ryzen as descbribed on the gentoo's dedicated > page. > > Compiled, version 4.20-rc1 as present in the branch. No errors reported. > > There are 2 main applications that are easier to test right now to find the > problems : > - Metro 2033 Redux through steam. > - Left for Dead 2 through steam. > > Started Metro 2033, worked for some minutes with no issue, but it was for some > reason without any sound. Closed. Turned off the HDMI audio on pavucontrol to > use only the default output. Restarted steam. > > Started Left for Dead 2 this time. Was able to change graphics settings to max > without AA and vsync. Played for 15 seconds and got a screen freeze. Waited for > a script to record properly the logs and temps. Hard rebooted. This time even > my BIOS/EFI screen had a green background, but still operational. Everything > was green except the text. Rebooted again, got back to normal colors. > > And here are the logs : > > kern.log about Firefox usage : > > Nov 14 05:26:50 desk kernel: [ 324.714998] Chrome_~dThread[1788]: segfault at 0 ip 00007fbfee5e3181 sp 00007fbfec2d1ad0 error 6 in libxul.so[7fbfee5cf000+3a2c000] > > It points that the CPU stills with either a problematic microcode or is > defective. > > dmesg about amdgpu screen freeze : > > [ 3323.920795] amdgpu 0000:09:00.0: GPU fault detected: 146 0x0000080c for process hl2_linux pid 14648 thread amdgpu_cs:0 pid 14653 > > [ 3323.920799] amdgpu 0000:09:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 > > [ 3323.920801] amdgpu 0000:09:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0200800C > > [ 3323.920804] amdgpu 0000:09:00.0: VM fault (0x0c, vmid 1, pasid 32774) at page 0, read from 'TC0' (0x54433000) (8) > > [ 3334.103233] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=274140, emitted seq=274142 > > [ 3334.103239] amdgpu 0000:09:00.0: GPU reset begin! > > [ 3344.332607] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:46:crtc-0] hw_done or flip_done timed out > > [ 3504.834097] INFO: task kworker/u32:2:3872 blocked for more than 120 seconds. > > [ 3504.834103] Not tainted 4.20.0-rc1-amd #2 > > [ 3504.834105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > [ 3504.834107] kworker/u32:2 D 0 3872 2 0x80000000 > > [ 3504.834123] Workqueue: events_unbound commit_work [drm_kms_helper] > > [ 3504.834126] Call Trace: > > [ 3504.834133] ? __schedule+0x2a0/0x880 > > [ 3504.834136] schedule+0x28/0x80 > > [ 3504.834139] schedule_timeout+0x25d/0x380 > > [ 3504.834217] ? dce110_timing_generator_get_position+0x5b/0x70 [amdgpu] > > [ 3504.834292] ? dce110_timing_generator_get_crtc_scanoutpos+0x70/0xb0 [amdgpu] > > [ 3504.834297] dma_fence_default_wait+0x23b/0x2a0 > > [ 3504.834301] ? dma_fence_release+0x90/0x90 > > [ 3504.834304] dma_fence_wait_timeout+0xdd/0x100 > > [ 3504.834308] reservation_object_wait_timeout_rcu+0x161/0x270 > > [ 3504.834387] amdgpu_dm_do_flip+0x112/0x370 [amdgpu] > > [ 3504.834468] amdgpu_dm_atomic_commit_tail+0x68b/0xcd0 [amdgpu] > > [ 3504.834472] ? __switch_to_asm+0x40/0x70 > > [ 3504.834475] ? wait_for_completion_timeout+0x3b/0x1a0 > > [ 3504.834477] ? __switch_to_asm+0x34/0x70 > > [ 3504.834480] ? __switch_to_asm+0x40/0x70 > > [ 3504.834483] ? __switch_to+0x1ba/0x450 > > [ 3504.834492] commit_tail+0x3d/0x70 [drm_kms_helper] > > [ 3504.834497] process_one_work+0x1aa/0x3a0 > > [ 3504.834500] worker_thread+0x30/0x3a0 > > [ 3504.834503] ? drain_workqueue+0x130/0x130 > > [ 3504.834506] kthread+0x11d/0x140 > > [ 3504.834509] ? kthread_park+0x80/0x80 > > [ 3504.834512] ret_from_fork+0x22/0x40 > > [ 3516.645267] WARNING: CPU: 14 PID: 14694 at kernel/kthread.c:501 kthread_park+0x6c/0x80 > > [ 3516.645271] Modules linked in: fuse edac_mce_amd kvm_amd nls_ascii nls_cp437 vfat fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec joydev amdgpu snd_hda_core snd_hwdep chash gpu_sched snd_pcm snd_timer ttm drm_kms_helper snd drm i2c_algo_bit sp5100_tco soundcore kvm efi_pstore efivars sg irqbypass evdev wmi_bmof serio_raw pcspkr k10temp ccp tpm_crb pcc_cpufreq tpm_tis tpm_tis_core tpm rng_core acpi_cpufreq button parport_pc ppdev lp parport efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto btrfs xor zstd_decompress zstd_compress xxhash raid6_pq libcrc32c crc32c_generic algif_skcipher af_alg dm_crypt dm_mod sd_mod hid_generic usbhid hid uas usb_storage crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ahci xhci_pci aes_x86_64 libahci crypto_simd xhci_hcd cryptd glue_helper libata r8169 i2c_piix4 libphy usbcore scsi_mod thermal wmi gpio_amdpt gpio_generic > > [ 3516.645324] CPU: 14 PID: 14694 Comm: TaskSchedulerFo Not tainted 4.20.0-rc1-amd #2 > > [ 3516.645327] Hardware name: BIOSTAR Group X370GT7/X370GT7, BIOS 5.13 08/07/2018 > > [ 3516.645330] RIP: 0010:kthread_park+0x6c/0x80 > > [ 3516.645333] Code: 18 e8 88 6c 67 00 be 40 00 00 00 48 89 df e8 8b c3 00 00 48 85 c0 74 1b 31 c0 5b 5d c3 0f 0b eb ae 0f 0b b8 da ff ff ff eb f0 <0f> 0b b8 f0 ff ff ff eb e7 0f 0b eb e3 0f 1f 80 00 00 00 00 0f 1f > > [ 3516.645335] RSP: 0018:ffffbafdc3fcfb60 EFLAGS: 00010202 > > [ 3516.645338] RAX: 0000000000000004 RBX: ffff9dcd93f140c0 RCX: dead000000000200 > > [ 3516.645339] RDX: ffff9dcd92ba7430 RSI: ffff9dcd93f140c0 RDI: ffff9dcd8a9049c0 > > [ 3516.645341] RBP: ffff9dcd940a5360 R08: ffff9dcd96da25a8 R09: 0000000000000000 > > [ 3516.645343] R10: 0000000000000000 R11: 000000000000019c R12: ffff9dcd92ba27a0 > > [ 3516.645344] R13: ffff9dcd76d34200 R14: 0000000000000206 R15: dead000000000100 > > [ 3516.645347] FS: 00007efea483e700(0000) GS:ffff9dcd96d80000(0000) knlGS:0000000000000000 > > [ 3516.645349] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 3516.645351] CR2: 00005654fe725e10 CR3: 0000000200d40000 CR4: 00000000003406e0 > > [ 3516.645352] Call Trace: > > [ 3516.645362] drm_sched_entity_fini+0x37/0x190 [gpu_sched] > > [ 3516.645423] amdgpu_vm_fini+0xad/0x530 [amdgpu] > > [ 3516.645429] ? idr_destroy+0x78/0xc0 > > [ 3516.645481] amdgpu_driver_postclose_kms+0x151/0x270 [amdgpu] > > [ 3516.645496] drm_file_free.part.5+0x21f/0x300 [drm] > > [ 3516.645510] drm_release+0xaa/0x120 [drm] > > [ 3516.645514] __fput+0xac/0x1e0 > > [ 3516.645518] task_work_run+0x8f/0xb0 > > [ 3516.645522] do_exit+0x2e6/0xb30 > > [ 3516.645525] do_group_exit+0x3a/0xb0 > > [ 3516.645528] get_signal+0x27a/0x5f0 > > [ 3516.645532] do_signal+0x30/0x6d0 > > [ 3516.645537] exit_to_usermode_loop+0x89/0xf0 > > [ 3516.645540] do_syscall_64+0xda/0xe0 > > [ 3516.645544] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > [ 3516.645547] RIP: 0033:0x7efeb6b9d19a > > [ 3516.645553] Code: Bad RIP value. > > [ 3516.645555] RSP: 002b:00007efea483d810 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca > > [ 3516.645557] RAX: fffffffffffffdfc RBX: 00007efea483d958 RCX: 00007efeb6b9d19a > > [ 3516.645559] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007efea483d980 > > [ 3516.645560] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ffe661d7080 > > [ 3516.645562] R10: 00007efea483d860 R11: 0000000000000246 R12: 0000000000000000 > > [ 3516.645564] R13: 00007efea483d980 R14: 00007efea483d990 R15: 00007efea483d930 > > [ 3516.645566] ---[ end trace 7da35ac4aa65c90d ]--- > > It is important to note that the most common code that appears while using > generic kernels is 147 despite of 146 that is being shown here. > > Xorg.0.log reports nothing. > > I said that these were bad news because seems to me that both CPU and amdgpu > driver are defective. > > I noticed that while running kernel 4.18 the gpu is kept at 100% (mclk and > sclk) all the time while with this new kernel the GPU tries to scale the > performance. > > Also, it is important to note that the nvidia GTX 1070 throws a lot of xid > error codes ( see > https://devtalk.nvidia.com/default/topic/1043483/linux/xid-errors-on-gtx-1070-linux/post/5293440 > ). And this is why I'm thinking that the 1800X has a defective pci-controller. > And it is also the second part of the "really bad news". Maybe it is happening > mostly with ryzen processors? I'll test the RX480 with the other computer ASAP, > need to send informations about the CPU for AMD to proceed with the warranty > process. > > The GTX 1070 works without a single problem outside of this PC. The other cards > that I had tested before follows the same pattern ( 2 RX480, 1 RX 580, 1 GTX > 970, 1 GTX 1070). > > Currently I have only 1 RX480 and 1 GTX 1070. Now that I know that the cards > don't have any problem I'm selling the cards and soon I'll have only one or > none. The seller told me off because of requesting warranty for the RX 480 when > I thought it was defective, he sent me another different and the one that I > sent was working without any issues according to him. > > I'm already in a new stage of re-sending the CPU for AMD, and praying to solve > my endless torment. I think that they'll have to refund me (and then I'll have > a loss with the motherboard). > > Please tell me any other step that you may want to be done. > > I can also provide a full description of the kernel compilation (parameters) > and even provide a link to the generated .deb packages. > You are receiving this mail because: > You are the assignee for the bug. > _______________________________________________ > dri-devel mailing list > dri-devel@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel