On 30 September 2016 at 18:26, Laszlo Ersek <lersek@xxxxxxxxxx> wrote: > On 09/30/16 18:38, Hans de Goede wrote: >> Hi, >> >> On 30-09-16 17:33, Laszlo Ersek wrote: >>> On 09/30/16 16:59, Hans de Goede wrote: >>>> Hi, >>>> >>>> On 30-09-16 16:51, Laszlo Ersek wrote: >>>>> On 09/30/16 12:35, Hans de Goede wrote: >>>>> >>>>>> Attached are 2 patches against the xserver which should fix this, >>>>>> please give them a try. >>>>> >>>>> Sorry about the delay. >>>>> >>>>> The patches don't seem to fix the issue for me. Please see the Xorg log >>>>> attached. >>>>> >>>>> I tested the patches as follows. Given that my bisection had been done >>>>> in a Fedora 24 guest, using >>>>> >>>>> xorg-x11-server-1.18.4-4.fc24 >>>>> http://koji.fedoraproject.org/koji/buildinfo?buildID=794494 >>>>> >>>>> I now rebuilt the guest kernel exactly at the failing commit (a325725 >>>>> "drm: Lobotomize set_busid nonsense for !pci drivers"), and first >>>>> reproduced the issue with the above X server. >>>>> >>>>> Then, I ported your patches to "xorg-server-1.18.4" (using the upstream >>>>> xserver tree), and rebuilt the Fedora package with the backport. For >>>>> the >>>>> backport, I had to cherry-pick the following two patches from master >>>>> first: >>>>> >>>>> 1 ca8d88e50310 xfree86: recognize primary BUS_PCI device in >>>>> xf86IsPrimaryPlatform() >>>>> 2 ea91db4b8331 config: fix GPUDevice fail when AutoAddGPU off + BusID >>>>> >>>>> This way your patches applied cleanly. (Cherry pick #1 above is >>>>> actually >>>>> necessary for semantics, while cherry pick #2 is needed for a clean >>>>> context only, and has no impact for this test.) >>>>> >>>>> That is, in total, I added the following four patches to the Fedora 24 >>>>> package: >>>>> >>>>> 1 xfree86: recognize primary BUS_PCI device in xf86IsPrimaryPlatform() >>>>> 2 config: fix GPUDevice fail when AutoAddGPU off + BusID >>>>> 3 xfree86: Make adding unclaimed devices as GPU devices a separate step >>>>> 4 xfree86: Try harder to find atleast 1 non GPU Screen >>>>> >>>>> You can find the scratch build that I used for testing here: >>>>> >>>>> xorg-x11-server-1.18.4-4.hans_bz1366842_2.fc24 >>>>> http://koji.fedoraproject.org/koji/taskinfo?taskID=15875087 >>>>> >>>>> Another reason I used F24's X server as basis, rather than upstream >>>>> HEAD, is that Fedora 24 is pretty young, and it's already on kernel >>>>> 4.7.4, and I believe it will soon move to kernel 4.8, without >>>>> (necessarily) rebasing its X server package to upstream. IOW the kernel >>>>> upgrade to 4.8 will break X in Fedora 24 too, and then I expect the >>>>> Fedora X maintainers would have to cherry pick those two patches as >>>>> dependencies just the same. >>>>> >>>>> To summarize, the patches don't seem to help. I shall nonetheless thank >>>>> you for spending your Friday on this! >>>> >>>> Hmm, do you have a xorg.conf file lying around somewhere, the message >>>> about the xserver not being able to find an entry for screen 0 does >>>> not make sense ... >>> >>> Good catch, I actually had two files under "/etc/X11/xorg.conf.d/": >>> >>> * "00-keyboard.conf", from package "systemd-229-13.fc24.x86_64", with >>> contents >>> >>> ------------ >>> # Read and parsed by systemd-localed. It's probably wise not to edit >>> this file >>> # manually too freely. >>> Section "InputClass" >>> Identifier "system-keyboard" >>> MatchIsKeyboard "on" >>> Option "XkbLayout" "us" >>> EndSection >>> ------------ >>> >>> * "01-resolution.conf", which I had created, in order to set the >>> preferred display resolution: >>> >>> ------------ >>> Section "Screen" >>> Identifier "Default Screen" >>> Device "Default Device" >>> Monitor "Default Monitor" >>> EndSection >>> >>> Section "Device" >>> Identifier "Default Device" >>> Driver "modesetting" >>> EndSection >>> >>> Section "Monitor" >>> Identifier "Default Monitor" >>> Option "PreferredMode" "640x480" >>> # Option "PreferredMode" "1440x900" >>> EndSection >>> ------------ >>> >>> I removed these files now, and repeated the test. Again, the X server >>> wouldn't start, but I think the log file looks a bit different now. >>> Attached. >> >> Ah, ok so it seems that my initial analysis is wrong, the problem >> is not a re-occuring of the device getting identified as a GPU screen, >> libdrm sorta depends on bus-ids and the lack of one is causing the >> server to misbehave. I guess that even with a xorg.conf things >> will fail with the troublesome kernel version (might be worth >> trying). >> >> Emil's analysis seems to be spot on. This does not seem easily >> fixable in userspace / does seem like a real regression as it >> even breaks things when specifying the device through xorg.conf >> (I or so I believe) which is something which uses to work ... > > In order to check this hypothesis, I did the following: > - I downgraded my xorg-x11-server installation to the most recent > official F24 packages, that is, "1.18.4-4.fc24", > - I kept the kernel that I built exactly at the regressive commit > (a325725633c2) > - I modified "01-resolution.conf" (see it above in the context) like this: > > ---- > Section "Device" > Identifier "Default Device" > Driver "modesetting" > BusID "PCI:00:02:0" <------------ new option added > EndSection > ---- > > where BusID matches the B/D/F of the virtio-vga device from "lspci". > > This setup (modulo the kernel of course) was known to work, but now the > X server actually segfaults (apparently in the > xf86PlatformDeviceCheckBusID() function). Please find the logfile attached. > > (NB: this is unrelated to upstream commit de9ce6757c2e -- which the > pristine FC24 build lacks -- because I don't set AutoAddGPU to "off" -- > it is left at its default "on" value.) > Where is this upstream commit again - it shows as unknown for the kernel, xserver and libdrm ? So my theory was a bit off - SetVersion is the one responsible to set the "BusID", as retrieved by drmGetBusID, regardless if drmOpen or open is used. Here's a bit of a brain dump from the other day: - The commit mentioned 'affects' the drmSetBusid/DRM_IOCTL_SET_UNIQUE userspace codepaths. - The latter itself is dri1/legacy (xserver hw/xfree86/dri/) which is not functional for platform devices. The latter of which seems to be the case for virt-gpu based on the kernel module. - The modesetting driver should/cannot reach the above xserver codepath That said, it seems that (at least some) userspace expects a PCI device despite the kernel module 'advertising' itself as platform one :-\ Going through the xserver layers is a bit inspiring I'm wondering if we can not get a strace before/after the xserver commit ca8d88e50310a0d440a127c22a0a383cc149f408 ? It will help us track things a lot quicker/easier. Thanks Emil _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel