[Bug 111763] ring_gfx hangs/freezes on Navi gpus

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Comment # 24 on bug 111763 from
(In reply to wychuchol from comment #23)
> (In reply to wychuchol from comment #19)
> > After some time in Witcher 3 GOTY run with Lutris PC restarts on it's own. I
> > thought something is overheating (I've noticed graphic card memory in
> > PSensor sometimes reaching 90 so I thought maybe that's what's happening)
> > but I investigated kern.log and this always happened before that autonomous
> > reset:
> > 
> > Nov  2 22:01:53 pop-os kernel: [  979.244964] pcieport 0000:00:01.1: AER:
> > Corrected error received: 0000:01:00.0
> > Nov  2 22:01:53 pop-os kernel: [  979.244967] nvme 0000:01:00.0: AER: PCIe
> > Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
> > Nov  2 22:01:53 pop-os kernel: [  979.244968] nvme 0000:01:00.0: AER:  
> > device [1987:5012] error status/mask=00001000/00006000
> > Nov  2 22:01:53 pop-os kernel: [  979.244968] nvme 0000:01:00.0: AER:   
> > [12] Timeout               
> > Nov  2 22:01:53 pop-os kernel: [  979.262629] Emergency Sync complete
> 
> Thing with those AER errors is that they can go on and on and reset happens
> few minutes after the last logged error. 
> This might be overheating, I managed to find how to output sensors readings
> into txt log and found that memory went up to 96 C (or rather it stayed
> there for about 1m 10s)
> Last reading before reset:
> amdgpu-pci-2800
> Adapter: PCI adapter
> vddgfx:       +1.16 V  
> fan1:        1551 RPM  (min =    0 RPM, max = 3200 RPM)
> edge:         +74.0°C  (crit = +118.0°C, hyst = -273.1°C)
>                        (emerg = +99.0°C)
> junction:     +88.0°C  (crit = +99.0°C, hyst = -273.1°C)
>                        (emerg = +99.0°C)
> mem:          +96.0°C  (crit = +99.0°C, hyst = -273.1°C)
>                        (emerg = +99.0°C)
> power1:      162.00 W  (cap = 195.00 W)
> 
> k10temp-pci-00c3
> Adapter: PCI adapter
> Tdie:         +70.5°C  (high = +70.0°C)
> Tctl:         +70.5°C  
> 
> Now the weird thing is - if this is in fact overheating why fan didn't go
> beyond 1600 rpm even once.... Highest was like 1581 rpm and I don't have
> silent bios switched on (sapphire pulse rx 5700 xt, lever facing away from
> video ports).

Okay I don't think it's overheating anymore. I found a moment in Anomaly 1.5.0
I can't get past without system resetting, just before a psi storm in Army
Warehouses (I can provide a savefile).

Last sensors reading before crash (5 second increments):
amdgpu-pci-2800
Adapter: PCI adapter
vddgfx:       +1.01 V  
fan1:        1560 RPM  (min =    0 RPM, max = 3200 RPM)
edge:         +69.0°C  (crit = +118.0°C, hyst = -273.1°C)
                       (emerg = +99.0°C)
junction:     +84.0°C  (crit = +99.0°C, hyst = -273.1°C)
                       (emerg = +99.0°C)
mem:          +80.0°C  (crit = +99.0°C, hyst = -273.1°C)
                       (emerg = +99.0°C)
power1:      227.00 W  (cap = 195.00 W)

k10temp-pci-00c3
Adapter: PCI adapter
Tdie:         +71.8°C  (high = +70.0°C)
Tctl:         +71.8°C


You are receiving this mail because:
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux