Hi, On Mon, Jan 07, 2019 at 09:50:52AM +0100, Lucas Stach wrote: > Hi Guido, > > Am Sonntag, den 30.12.2018, 16:49 +0100 schrieb Guido Günther: > > Hi Lucas, > > On Wed, Dec 19, 2018 at 03:45:38PM +0100, Lucas Stach wrote: > > > Keep the page at address 0 as faulting to catch any potential state > > > setup issues early. > > > > This is a nice idea! But applying this and making mesa hit that page > > leads to the process hanging in D state over here on GC7000: > > > > # [ 242.726192] INFO: task kworker/u8:2:37 blocked for more than 120 seconds. > > [ 242.733010] Not tainted 4.18.0-00129-gce2b21074b41 #504 > > [ 242.738795] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > [ 242.746638] kworker/u8:2 D 0 37 2 0x00000028 > > [ 242.752144] Workqueue: events_unbound commit_work > > [ 242.756860] Call trace: > > [ 242.759318] __switch_to+0x94/0xd0 > > [ 242.762741] __schedule+0x1c0/0x6b8 > > [ 242.766239] schedule+0x40/0xa8 > > [ 242.769380] schedule_timeout+0x2f0/0x428 > > [ 242.773410] dma_fence_default_wait+0x1cc/0x2b8 > > [ 242.777951] dma_fence_wait_timeout+0x44/0x1b0 > > [ 242.782403] drm_atomic_helper_wait_for_fences+0x48/0x108 > > [ 242.787819] commit_tail+0x30/0x80 > > [ 242.791229] commit_work+0x20/0x30 > > [ 242.794642] process_one_work+0x1ec/0x458 > > [ 242.798659] worker_thread+0x48/0x430 > > [ 242.802331] kthread+0x130/0x138 > > [ 242.805557] ret_from_fork+0x10/0x1c > > > > This is in dmesg showing that we hit the first page: > > > > [ 65.907388] etnaviv-gpu 38000000.gpu: MMU fault status 0x00000002 > > [ 65.913497] etnaviv-gpu 38000000.gpu: MMU 0 fault addr 0x00000e40 > > > > Without that patch it's sampling random data from that page but does not hang. > > GPU hangs after a MMU fault are expected or more accurately, we > actively request the GPU to stop by setting the exception bit in the > page table. Yeah. I put that in to show that this the cause for the trouble above. > > A hanging GPU should trigger the scheduler timeout handler, which then > makes sure to get the GPU back into a working state. So if things don't > progress after the fault for you either the timeout handler is buggy on > GC7000, or the fence signaling is broken somehow. I'll take a look at > this. This isn't a top notch linux-next based tree yet so if you're not seeing this let me forward port our stuff to that and report back again. Cheers, -- Guido _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel