On 19/12/16 13:29, Chris Wilson wrote: > On Mon, Dec 19, 2016 at 12:39:16PM +0100, Juergen Gross wrote: >> With recent 4.10 kernel the graphics isn't coming up under Xen. First >> failure message is: >> >> [ 46.656649] i915 0000:00:02.0: swiotlb buffer is full (sz: 1630208 bytes) > > Do we get a silent failure? i915_gem_gtt_prepare_pages() is where we > call dma_map_sg() and pass the sg to swiotlb (in this case) for > remapping, and we do check for an error value of 0. After that error, > SWIOTLB_MAP_ERROR is propagated back and converted to 0 for > dma_map_sg(). That looks valid, and we should report ENOMEM back to the > caller. > >> Later I see splats like: >> >> [ 49.393583] general protection fault: 0000 [#1] SMP > > What was the faulting address? RAX is particularly non-pointer-like so I > wonder if we walked onto an uninitialised portion of the sgtable. We may > have tripped over a bug in our sg_page iterator. During the bisect process there have been either GP or NULL pointer dereferences or other page faults. Typical addresses where: xen_swiotlb_unmap_sg_attrs+0x1f/0x50: access to 0000000000000018 xen_swiotlb_unmap_sg_attrs+0x1f/0x50: access to 0000000003020118 > > The attached patch should prevent an early ENOMEM following the swiotlb > allocation failure. But I suspect that we will still be tripping up the > failure in the sg walker when binding to the GPU. > -Chris > The patch is working not too bad. :-) Still several "swiotlb buffer is full" messages (some with sz:, most without), but no faults any more (neither GP nor NULL pointer dereference). Graphical login is working now. What I do see, however, is (no idea whether this is related): [ 735.826492] INFO: task systemd-udevd:484 blocked for more than 120 seconds. [ 735.826497] Tainted: G W 4.9.0-pv+ #767 [ 735.826499] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 735.826501] systemd-udevd D 0 484 443 0x00000000 [ 735.826507] Call Trace: [ 735.826522] ? __schedule+0x192/0x640 [ 735.826530] ? kmem_cache_free+0x45/0x150 [ 735.826535] ? schedule+0x2d/0x80 [ 735.826539] ? schedule_timeout+0x1f3/0x380 [ 735.826545] ? error_exit+0x9/0x20 [ 735.826555] ? sg_pool_index.part.0+0x2/0x2 [ 735.826561] ? wait_for_completion+0xa4/0x110 [ 735.826569] ? wake_up_q+0x70/0x70 [ 735.826577] ? cpufreq_boost_online+0x10/0x10 [acpi_cpufreq] [ 735.826585] ? cpuhp_issue_call+0x9c/0xe0 [ 735.826590] ? __cpuhp_setup_state+0xd5/0x1d0 [ 735.826599] ? acpi_cpufreq_init+0x1cd/0x1000 [acpi_cpufreq] [ 735.826601] ? 0xffffffffa00b1000 [ 735.826607] ? do_one_initcall+0x38/0x180 [ 735.826611] ? kmem_cache_alloc_trace+0x98/0x1e0 [ 735.826620] ? do_init_module+0x55/0x1e5 [ 735.826629] ? load_module+0x2088/0x26b0 [ 735.826633] ? __symbol_put+0x30/0x30 [ 735.826639] ? SYSC_finit_module+0x80/0xb0 [ 735.826644] ? entry_SYSCALL_64_fastpath+0x1e/0xad I guess it is _not_ related, OTOH there is sg_pool_index() involved... Juergen _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx