On Sat, Jan 25, 2020 at 07:01:36PM +0000, Koenig, Christian wrote: > > > Am 25.01.2020 19:47 schrieb Andreas Messer <andi@xxxxxxxxxxxx>: > When backing up a ring, validate pointer to avoid page fault. > [ cut description / kernel messages ] > > NAK, that was suggested multiple times now and is essentially the wrong > approach. > > The problem is that the value is invalid because the hardware is not > functional any more. Returning here without backing up the ring just > papers over the real problem. > > This is just the first occurance of this and you would need to fix a > couple of hundred register accesses (both inside and outside of the > driver) to make that really work reliable. Sure, it wont fix the hardware. But since the page fault is most prominent part in kernel log, people will continue suggesting it. With that change, the kernel messages are full of ring and atom bios timeouts and might make users more likely to consider a hardware issue in the first place. Anyway: > The only advice I can give you is to replace the hardware. From > experience those symptoms mean that your GPU will die rather soon. I think my hardware is fine. I have monitored gpu temp and fan pwm now for a while and found the pwm to be driven at ~60% only although the gpu already got quite high temperature during gameplay. When forcing the pwm to ~80% no crash occurs anymore. I suppose it is not the GPU crashing but instead the VRMs, not getting enough airflow. I have compared the Bios fan tables of my card with them of other cards bios (downloaded from web) of same GPU type and similar design. Although they differ in cooler construction and used fan, all of them despite one model have exactly the same fan regulation points with PWMHigh at 80% for 90°C. This single model with other settings has 100% for this temp and generally much more sane looking regulation curve. I suppose most of the vendors just copied some reference design, maybe the vendor's windows driver adjust the curve to a better one, I don't know. I think I'll add some sysfs attributes or module parameter to adjust the curve to my needs. > [ Patch cut out ] cheers, Andreas
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx