Dear Daniel, On 05/16/2019 01:52 PM, Daniel Kasak wrote: > On Thu, May 16, 2019 at 11:43 AM Alex Deucher <alexdeucher@xxxxxxxxx> wrote: > >> On Wed, May 15, 2019 at 8:33 PM Daniel Kasak <d.j.kasak.dk@xxxxxxxxx> >> wrote: >>> >>> On Mon, May 13, 2019 at 11:44 AM Daniel Kasak <d.j.kasak.dk@xxxxxxxxx> >> wrote: >>>> >>>> Hi all. I had version 2.2.0 of the ROCM stack running on a 5.0.x and >> 5.1.0 kernel. Things were going great with various boinc GPU tasks. But >> there is a setiathome GPU task which reliably gives me a hard lockup within >> about 30 minutes of running. I actually had to do *two* emergency >> re-installs over the past week. Perhaps part of this was my fault ( running >> btrfs with lzo compression on my root partition ... ). But absolutely part >> of this was the hard lockups. I've tested all kinds of other things ( eg >> rebuilding lots of stuff under Gentoo ) ... I don't have a general >> stability issue even under hours of high load. But after restarting boinc >> with that same setiathome task ... <bang>! >>>> >>>> If someone wants me to sacrifice another installation, they can point >> me to instructions for trying to gather more information. >>>> >>>> Anyway ... perhaps more work around detecting and recovering from GPU >> lockups is in order? >>> <sigh> >>> >>> That's what I was afraid of :( >> >> Not sure what you were afraid of. I don't think anyone has looked at >> setiathome on ROCm. I'd suggest filing a bug >> (https://bugs.freedesktop.org) and attaching your dmesg output and >> xorg log (if using X). If there is a GPU reset, note that you will >> need to restart your desktop environment because currently neither >> glamor or any compositors support GL robustness extensions to reset >> their contexts after a GPU reset. > Hi Alex. dmesg output is not available ... this is a *hard* lockup. I need > to power-cycle after it happens ( ALT + SysRq + { S , U , B } doesn't even > work ). That's why I asked for instructions to possibly gather more info. I > did check the xorg log after I did an emergency export of my filesystem ... > nothing of interest in there. It seems like I currently don't really have > enough info to make a bug report worthwhile. Does your board have a serial port? If yes, please use the serial console to gather the messages on another system. Sometimes the netconsole [1] is also supposed to be able to send the last Linux messages out. Kind regards, Paul [1]: https://www.kernel.org/doc/Documentation/networking/netconsole.txt
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx