I have a rtx 3070 and a 3090, I am absolutely sure I am binding vfio-pci to the 3090 and not the 3070.
I have bound the driver in two different ways, first by passing the IDs to the module and alternatively by manipulating the system interface and use the override (this is what I originally had to do when I used two 1080s, so I know it works).
While the 3090 doesn't show a console, there's a remnant from the refund (and grub previously) there.
The assessment Alex made previously, where aperture_remove_conflicting_pci_devices() is removing the driver (EFIFB) instead of the device seems correct, but it could also can be a quirky of how EFIFB is implemented. I recall reading a long time ago that EFIFB is a special device and once it detects changes it would simply give up. There was also no way to attach a device to it again as it depends on being preloaded outside the kernel; once something takes over the buffer reinitializing is "impossible". I never went deeper to try and understand it.
On Mon, Dec 5, 2022, 2:00 AM Thomas Zimmermann <tzimmermann@xxxxxxx> wrote:
Hi
Am 05.12.22 um 01:51 schrieb Alex Williamson:
> On Sat, 3 Dec 2022 17:12:38 -0700
> "mb@xxxxxxx" <mb@xxxxxxx> wrote:
>
>> Hi,
>>
>> I hope it is ok to reply to this old thread.
>
> It is, but the only relic of the thread is the subject. For reference,
> the latest version of this posted is here:
>
> https://lore.kernel.org/all/20220622140134.12763-4-tzimmermann@xxxxxxx/
>
> Which is committed as:
>
> d17378062079 ("vfio/pci: Remove console drivers")
>
>> Unfortunately, I found a
>> problem only now after upgrading to 6.0.
>>
>> My setup has multiple GPUs (2), and I depend on EFIFB to have a working console.
Which GPUs do you have?
>> pre-patch behavior, when I bind the vfio-pci to my secondary GPU both
>> the passthrough and the EFIFB keep working fine.
>> post-patch behavior, when I bind the vfio-pci to the secondary GPU,
>> the EFIFB disappears from the system, binding the console to the
>> "dummy console".
The efifb would likely use the first GPU. And vfio-pci should only
remove the generic driver from the second device. Are you sure that
you're not somehow using the first GPU with vfio-pci.
>> Whenever you try to access the terminal, you have the screen stuck in
>> whatever was the last buffer content, which gives the impression of
>> "freezing," but I can still type.
>> Everything else works, including the passthrough.
>
> This sounds like the call to aperture_remove_conflicting_pci_devices()
> is removing the conflicting driver itself rather than removing the
> device from the driver. Is it not possible to unbind the GPU from
> efifb before binding the GPU to vfio-pci to effectively nullify the
> added call?
>
>> I can only think about a few options:
>>
>> - Is there a way to have EFIFB show up again? After all it looks like
>> the kernel has just abandoned it, but the buffer is still there. I
>> can't find a single message about the secondary card and EFIFB in
>> dmesg, but there's a message for the primary card and EFIFB.
>> - Can we have a boolean controlling the behavior of vfio-pci
>> altogether or at least controlling the behavior of vfio-pci for that
>> specific ID? I know there's already some option for vfio-pci and VGA
>> cards, would it be appropriate to attach this behavior to that option?
>
> I suppose we could have an opt-out module option on vfio-pci to skip
> the above call, but clearly it would be better if things worked by
> default. We cannot make full use of GPUs with vfio-pci if they're
> still in use by host console drivers. The intention was certainly to
> unbind the device from any low level drivers rather than disable use of
> a console driver entirely. DRM/GPU folks, is that possibly an
> interface we could implement? Thanks,
When vfio-pci gives the GPU device to the guest, which driver driver is
bound to it?
Best regards
Thomas
>
> Alex
>
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev