Am Donnerstag, den 27.06.2019, 15:32 +0100 schrieb Russell King - ARM Linux admin: > On Thu, Jun 27, 2019 at 11:04:17AM +0100, Russell King - ARM Linux admin wrote: > > On Thu, Jun 27, 2019 at 11:20:15AM +0200, Lucas Stach wrote: > > > Am Samstag, den 22.06.2019, 17:16 +0100 schrieb Russell King - ARM Linux admin: > > > > While updating my various systems for the TCP SACK issue, I notice > > > > that while most platforms are happy, the Cubox-i4 is not. During > > > > boot, we get: > > > > > > > > [ 0.000000] cma: Reserved 256 MiB at 0x30000000 > > > > ... > > > > [ 0.000000] Kernel command line: console=ttymxc0,115200n8 console=tty1 video=mxcfb0:dev=hdmi root=/dev/nfs rw cma=256M ahci_imx.hotplug=1 splash resume=/dev/sda1 > > > > [ 0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) > > > > [ 0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) > > > > [ 0.000000] Memory: 1790972K/2097152K available (8471K kernel code, 693K rwdata, 2844K rodata, 500K init, 8062K bss, 44036K reserved, 262144K cma-reserved, 1310720K highmem) > > > > ... > > > > [ 13.101098] etnaviv-gpu 130000.gpu: command buffer outside valid memory window > > > > [ 13.171963] etnaviv-gpu 134000.gpu: command buffer outside valid memory window > > > > > > Yes, that's a regression due to different default CMA area placement > > > and etnaviv not being smart enough to move the linear window to the > > > right offset. > > > > As it's a user visible regression, it needs fixing, either by reverting > > the changes that caused it or by some other issue. In the kernel, the > > policy is "if a bug fix causes a regression, the bug fix was itself > > wrong". We don't fix one person's bug if it causes a regression for > > someone else. > > > > Please resolve the acknowledged regression. The regression is caused due to a different CMA placement, which is outside of the control of etnaviv. If you can point to the commit causing this change in placement we could work with the authors/maintainers of this code to get rid of the regression. Currently I don't have the bandwidth to pinpoint the offending code change. > > > > and shortly after the login prompt appears, the entire SoC appears to > > > > lock up - it becomes unresponsive on the network, or via serial console > > > > to sysrq requests. > > > > > > > > I suspect the GPU ends up scribbling over the CPU's vector page/kernel > > > > as a result of the above two etnaviv errors when Xorg attempts to start > > > > using the GPU. > > > > > > This should not be possible. The driver notices that the command buffer > > > isn't accessible to the GPU, which aborts the GPU init. While the > > > etnaviv DRM device is still accessible, it will not expose any > > > enumerable GPU cores to userspace. So there is no way for userspace to > > > actually submit GPU commands. > > > > Yep, I came to that conclusion. Nevertheless, if I allow Xorg to start > > with 5.1, the system totally hangs shortly thereafter. I need to try > > without etnaviv loaded at all. > > Well, it seems to get worse. I just tried to unload etnaviv, and was > greeted by this oops. It's another regression; etnaviv used to unload > perfectly fine. Please can you add module unload testing to your > workflow? As you can see from the patch I've just sent, this is a missing error cleanup. So it's really the same regression. A module unload after successful init of all GPU cores doesn't show this crash. The issue is only unmasked due to the CMA placement regression. Regards, Lucas _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel