On 10/03/16 13:34, Emil Velikov wrote: > On 30 September 2016 at 18:26, Laszlo Ersek <lersek@xxxxxxxxxx> wrote: >> On 09/30/16 18:38, Hans de Goede wrote: >>> Hi, >>> >>> On 30-09-16 17:33, Laszlo Ersek wrote: >>>> On 09/30/16 16:59, Hans de Goede wrote: >>>>> Hi, >>>>> >>>>> On 30-09-16 16:51, Laszlo Ersek wrote: >>>>>> On 09/30/16 12:35, Hans de Goede wrote: >>>>>> >>>>>>> Attached are 2 patches against the xserver which should fix this, >>>>>>> please give them a try. >>>>>> >>>>>> Sorry about the delay. >>>>>> >>>>>> The patches don't seem to fix the issue for me. Please see the Xorg log >>>>>> attached. >>>>>> >>>>>> I tested the patches as follows. Given that my bisection had been done >>>>>> in a Fedora 24 guest, using >>>>>> >>>>>> xorg-x11-server-1.18.4-4.fc24 >>>>>> http://koji.fedoraproject.org/koji/buildinfo?buildID=794494 >>>>>> >>>>>> I now rebuilt the guest kernel exactly at the failing commit (a325725 >>>>>> "drm: Lobotomize set_busid nonsense for !pci drivers"), and first >>>>>> reproduced the issue with the above X server. >>>>>> >>>>>> Then, I ported your patches to "xorg-server-1.18.4" (using the upstream >>>>>> xserver tree), and rebuilt the Fedora package with the backport. For >>>>>> the >>>>>> backport, I had to cherry-pick the following two patches from master >>>>>> first: >>>>>> >>>>>> 1 ca8d88e50310 xfree86: recognize primary BUS_PCI device in >>>>>> xf86IsPrimaryPlatform() >>>>>> 2 ea91db4b8331 config: fix GPUDevice fail when AutoAddGPU off + BusID >>>>>> >>>>>> This way your patches applied cleanly. (Cherry pick #1 above is >>>>>> actually >>>>>> necessary for semantics, while cherry pick #2 is needed for a clean >>>>>> context only, and has no impact for this test.) >>>>>> >>>>>> That is, in total, I added the following four patches to the Fedora 24 >>>>>> package: >>>>>> >>>>>> 1 xfree86: recognize primary BUS_PCI device in xf86IsPrimaryPlatform() >>>>>> 2 config: fix GPUDevice fail when AutoAddGPU off + BusID >>>>>> 3 xfree86: Make adding unclaimed devices as GPU devices a separate step >>>>>> 4 xfree86: Try harder to find atleast 1 non GPU Screen >>>>>> >>>>>> You can find the scratch build that I used for testing here: >>>>>> >>>>>> xorg-x11-server-1.18.4-4.hans_bz1366842_2.fc24 >>>>>> http://koji.fedoraproject.org/koji/taskinfo?taskID=15875087 >>>>>> >>>>>> Another reason I used F24's X server as basis, rather than upstream >>>>>> HEAD, is that Fedora 24 is pretty young, and it's already on kernel >>>>>> 4.7.4, and I believe it will soon move to kernel 4.8, without >>>>>> (necessarily) rebasing its X server package to upstream. IOW the kernel >>>>>> upgrade to 4.8 will break X in Fedora 24 too, and then I expect the >>>>>> Fedora X maintainers would have to cherry pick those two patches as >>>>>> dependencies just the same. >>>>>> >>>>>> To summarize, the patches don't seem to help. I shall nonetheless thank >>>>>> you for spending your Friday on this! >>>>> >>>>> Hmm, do you have a xorg.conf file lying around somewhere, the message >>>>> about the xserver not being able to find an entry for screen 0 does >>>>> not make sense ... >>>> >>>> Good catch, I actually had two files under "/etc/X11/xorg.conf.d/": >>>> >>>> * "00-keyboard.conf", from package "systemd-229-13.fc24.x86_64", with >>>> contents >>>> >>>> ------------ >>>> # Read and parsed by systemd-localed. It's probably wise not to edit >>>> this file >>>> # manually too freely. >>>> Section "InputClass" >>>> Identifier "system-keyboard" >>>> MatchIsKeyboard "on" >>>> Option "XkbLayout" "us" >>>> EndSection >>>> ------------ >>>> >>>> * "01-resolution.conf", which I had created, in order to set the >>>> preferred display resolution: >>>> >>>> ------------ >>>> Section "Screen" >>>> Identifier "Default Screen" >>>> Device "Default Device" >>>> Monitor "Default Monitor" >>>> EndSection >>>> >>>> Section "Device" >>>> Identifier "Default Device" >>>> Driver "modesetting" >>>> EndSection >>>> >>>> Section "Monitor" >>>> Identifier "Default Monitor" >>>> Option "PreferredMode" "640x480" >>>> # Option "PreferredMode" "1440x900" >>>> EndSection >>>> ------------ >>>> >>>> I removed these files now, and repeated the test. Again, the X server >>>> wouldn't start, but I think the log file looks a bit different now. >>>> Attached. >>> >>> Ah, ok so it seems that my initial analysis is wrong, the problem >>> is not a re-occuring of the device getting identified as a GPU screen, >>> libdrm sorta depends on bus-ids and the lack of one is causing the >>> server to misbehave. I guess that even with a xorg.conf things >>> will fail with the troublesome kernel version (might be worth >>> trying). >>> >>> Emil's analysis seems to be spot on. This does not seem easily >>> fixable in userspace / does seem like a real regression as it >>> even breaks things when specifying the device through xorg.conf >>> (I or so I believe) which is something which uses to work ... >> >> In order to check this hypothesis, I did the following: >> - I downgraded my xorg-x11-server installation to the most recent >> official F24 packages, that is, "1.18.4-4.fc24", >> - I kept the kernel that I built exactly at the regressive commit >> (a325725633c2) >> - I modified "01-resolution.conf" (see it above in the context) like this: >> >> ---- >> Section "Device" >> Identifier "Default Device" >> Driver "modesetting" >> BusID "PCI:00:02:0" <------------ new option added >> EndSection >> ---- >> >> where BusID matches the B/D/F of the virtio-vga device from "lspci". >> >> This setup (modulo the kernel of course) was known to work, but now the >> X server actually segfaults (apparently in the >> xf86PlatformDeviceCheckBusID() function). Please find the logfile attached. >> >> (NB: this is unrelated to upstream commit de9ce6757c2e -- which the >> pristine FC24 build lacks -- because I don't set AutoAddGPU to "off" -- >> it is left at its default "on" value.) >> > Where is this upstream commit again - it shows as unknown for the > kernel, xserver and libdrm ? > > So my theory was a bit off - SetVersion is the one responsible to set > the "BusID", as retrieved by drmGetBusID, regardless if drmOpen or > open is used. > > Here's a bit of a brain dump from the other day: > > - The commit mentioned 'affects' the drmSetBusid/DRM_IOCTL_SET_UNIQUE > userspace codepaths. > - The latter itself is dri1/legacy (xserver hw/xfree86/dri/) which is > not functional for platform devices. > The latter of which seems to be the case for virt-gpu based on the > kernel module. > - The modesetting driver should/cannot reach the above xserver codepath > > That said, it seems that (at least some) userspace expects a PCI > device despite the kernel module 'advertising' itself as platform one > :-\ With kernel commit a325725633c2 applied, the drmGetBusid() call in get_drm_info() [hw/xfree86/os-support/linux/lnx_platform.c] returns "virtio0". Without kernel commit a325725633c2 in place, the same function call produces "pci:0000:00:02.0". I think that the idea of kernel commit a325725633c2: We already have a fallback in place to fill out the unique from dev->unique, which is set to something reasonable in drm_dev_alloc. is inappropriate for virtio devices. While it is true that "virtioN" is unique across the guest system, those identifiers are not stable. # ls -1d /sys/devices/pci*/*/virtio* /sys/devices/pci0000:00/0000:00:02.0/virtio0 /sys/devices/pci0000:00/0000:00:03.0/virtio1 /sys/devices/pci0000:00/0000:00:05.0/virtio2 /sys/devices/pci0000:00/0000:00:06.0/virtio3 /sys/devices/pci0000:00/0000:00:09.0/virtio4 If you tweak the PCI addresses of other virtio devices on the QEMU command line, while keeping the PCI address of the virtio-vga device intact, the "virtioN" identifier of the virtio-vga device may change in an unspecified / unexpected manner. >From the above listing, it seems like higher PCI Device numbers happen to end up with higher "virtioN" identifiers. While this is unspecified / non-guaranteed behavior, in practice it means the following: Moving the "virtio1" device from PCI address 00:03.0 to 00:01.0, while preserving the PCI address of virtio-vga as 00:02.0, will bump virtio-vga to "virtio1" from "virtio0" (and sink that other device from "virtio1" to "virtio0"). I think that unstable identifiers don't qualify for BusID use, regardless of the actual format of the IDs in question. Searching the patch quickly, drm_virtio_set_busid() is the only implementation of the "drm_driver.set_busid" callback that delegates the task to drm_pci_set_busid(). All other implementations of this callback were provided by drm_platform_set_busid(). drm_platform_set_busid() would ultimately format "dev->platformdev->name" and "dev->platformdev->id" into the bus identifier. Replacing that common logic with the drm_dev_alloc() fallback that is already in place is probably justified: judging from "virtio0", the fallback likely captures the exact same information, just with a different format. However, drm_virtio_set_busid() used to implement a different logic (encoding different information). It intentionally used stable PCI addresses rather than (platformdev->name, platformdev->id). So, I think that removing drm_virtio_set_busid() was an actual regression. I propose that the related hunks of a325725633c2 be reverted. For virtio-vga and virtio-gpu-pci, "virtioN" is an unstable and inappropraite BusID pattern. Thanks Laszlo _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel