Alex wrote: > > On Sun, Dec 19, 2021 at 11:41 AM Yann Dirson <ydirson@xxxxxxx> wrote: > > > > Christian wrote: > > > Am 19.12.21 um 17:00 schrieb Yann Dirson: > > > > Alex wrote: > > > >> Thinking about this more, I think the problem might be related > > > >> to > > > >> CPU > > > >> access to "VRAM". APUs don't have dedicated VRAM, they use a > > > >> reserved > > > >> carve out region at the top of system memory. For CPU access > > > >> to > > > >> this > > > >> memory, we kmap the physical address of the carve out region > > > >> of > > > >> system > > > >> memory. You'll need to make sure that region is accessible to > > > >> the > > > >> guest. > > > > So basically, the non-virt flow is is: (video?) BIOS reserves > > > > memory, marks it > > > > as reserved in e820, stores the physaddr somewhere, which the > > > > GPU > > > > driver gets. > > > > Since I suppose this includes the framebuffer, this probably > > > > has to > > > > occur around > > > > the moment the driver calls > > > > drm_aperture_remove_conflicting_pci_framebuffers() > > > > (which happens before this hw init step), right ? > > > > > > Well, that partially correct. The efifb is using the PCIe > > > resources > > > to > > > access the framebuffer and as far as I know we use that one to > > > kick > > > it out. > > > > > > The stolen memory we get over e820/registers is separate to that. How is the stolen memory communicated to the driver ? That host physical memory probably has to be mapped at the same guest physical address for the magic to work, right ? > > > > > > > ... which brings me to a point that's been puzzling me for some > > > > time, which is > > > > that as the hw init fails, the efifb driver is still using the > > > > framebuffer. > > > > > > No, it isn't. You are probably just still seeing the same screen. > > > > > > The issue is most likely that while efi was kicked out nobody > > > re-programmed the display hardware to show something different. > > > > > > > Am I right in suspecting that efifb should get stripped of its > > > > ownership of the > > > > fb aperture first, and that if I don't get a black screen on > > > > hw_init failure > > > > that issue should be the first focus point ? > > > > > > You assumption with the black screen is incorrect. Since the > > > hardware > > > works independent even if you kick out efi you still have the > > > same > > > screen content, you just can't update it anymore. > > > > It's not only that the screen keeps its contents, it's that the > > dom0 > > happily continues updating it. > > If the hypevisor is using efifb, then yes that could be a problem as > the hypervisor could be writing to the efifb resources which ends up > writing to the same physical memory. That applies to any GPU on a > UEFI system. You'll need to make sure efifb is not in use in the > hypervisor. That remark evokes several things to me. First one is that every time I've tried booting with efifb disabled in dom0, there was no visible improvements in the guest driver - i.i. I really have to dig how vram mapping is performed and check things are as expected anyway. The other is that, when dom0 cannot use efifb, entering a luks key is suddenly less user-friendly. But in theory I'd think we could overcome this by letting dom0 use efifb until ready to start the guest, a simple driver unbind at the right moment should be expected to work, right ? Going further and allowing the guest to use efifb on its own could possibly be more tricky (starting with a different state?) but does not seem to sound completely outlandish either - or does it ? > > Alex > > > > > > > But putting efi asside what Alex pointed out pretty much breaks > > > your > > > neck trying to forward the device. You maybe could try to hack > > > the > > > driver to use the PCIe BAR for framebuffer access, but that might > > > be > > > quite a bit slower. > > > > > > Regards, > > > Christian. > > > > > > > > > > >> Alex > > > >> > > > >> On Mon, Dec 13, 2021 at 3:29 PM Alex Deucher > > > >> <alexdeucher@xxxxxxxxx> > > > >> wrote: > > > >>> On Sun, Dec 12, 2021 at 5:19 PM Yann Dirson <ydirson@xxxxxxx> > > > >>> wrote: > > > >>>> Alex wrote: > > > >>>>> On Mon, Dec 6, 2021 at 4:36 PM Yann Dirson > > > >>>>> <ydirson@xxxxxxx> > > > >>>>> wrote: > > > >>>>>> Hi Alex, > > > >>>>>> > > > >>>>>>> We have not validated virtualization of our integrated > > > >>>>>>> GPUs. I > > > >>>>>>> don't > > > >>>>>>> know that it will work at all. We had done a bit of > > > >>>>>>> testing but > > > >>>>>>> ran > > > >>>>>>> into the same issues with the PSP, but never had a chance > > > >>>>>>> to > > > >>>>>>> debug > > > >>>>>>> further because this feature is not productized. > > > >>>>>> ... > > > >>>>>>> You need a functional PSP to get the GPU driver up and > > > >>>>>>> running. > > > >>>>>> Ah, thanks for the hint :) > > > >>>>>> > > > >>>>>> I guess that if I want to have any chance to get the PSP > > > >>>>>> working > > > >>>>>> I'm > > > >>>>>> going to need more details on it. A quick search some > > > >>>>>> time > > > >>>>>> ago > > > >>>>>> mostly > > > >>>>>> brought reverse-engineering work, rather than official AMD > > > >>>>>> doc. > > > >>>>>> Are > > > >>>>>> there some AMD resources I missed ? > > > >>>>> The driver code is pretty much it. > > > >>>> Let's try to shed some more light on how things work, taking > > > >>>> as > > > >>>> excuse > > > >>>> psp_v12_0_ring_create(). > > > >>>> > > > >>>> First, register access through [RW]REG32_SOC15() is > > > >>>> implemented > > > >>>> in > > > >>>> terms of __[RW]REG32_SOC15_RLC__(), which is basically a > > > >>>> [RW]REG32(), > > > >>>> except it has to be more complex in the SR-IOV case. > > > >>>> Has the RLC anything to do with SR-IOV ? > > > >>> When running the driver on a SR-IOV virtual function (VF), > > > >>> some > > > >>> registers are not available directly via the VF's MMIO > > > >>> aperture > > > >>> so > > > >>> they need to go through the RLC. For bare metal or > > > >>> passthrough > > > >>> this > > > >>> is not relevant. > > > >>> > > > >>>> It accesses registers in the MMIO range of the MP0 IP, and > > > >>>> the > > > >>>> "MP0" > > > >>>> name correlates highly with MMIO accesses in PSP-handling > > > >>>> code. > > > >>>> Is "MP0" another name for PSP (and "MP1" for SMU) ? The MP0 > > > >>>> version > > > >>> Yes. > > > >>> > > > >>>> reported at v11.0.3 by discovery seems to contradict the use > > > >>>> of > > > >>>> v12.0 > > > >>>> for RENOIR as set by soc15_set_ip_blocks(), or do I miss > > > >>>> something ? > > > >>> Typo in the ip discovery table on renoir. > > > >>> > > > >>>> More generally (and mostly out of curiosity while we're at > > > >>>> it), > > > >>>> do we > > > >>>> have a way to match IPs listed at discovery time with the > > > >>>> ones > > > >>>> used > > > >>>> in the driver ? > > > >>> In general, barring typos, the code is shared at the major > > > >>> version > > > >>> level. The actual code may or may not need changes to handle > > > >>> minor > > > >>> revision changes in an IP. The driver maps the IP versions > > > >>> from > > > >>> the > > > >>> ip discovery table to the code contained in the driver. > > > >>> > > > >>>> --- > > > >>>> > > > >>>> As for the register names, maybe we could have a short > > > >>>> explanation of > > > >>>> how they are structured ? Eg. mmMP0_SMN_C2PMSG_69: that > > > >>>> seems > > > >>>> to > > > >>>> be > > > >>>> a MMIO register named "C2PMSG_69" in the "MP0" IP, but I'm > > > >>>> not > > > >>>> sure > > > >>>> of the "SMN" part -- that could refer to the "System > > > >>>> Management > > > >>>> Network", > > > >>>> described in [0] as an internal bus. Are we accessing this > > > >>>> register > > > >>>> through this SMN ? > > > >>> These registers are just mailboxes for the PSP firmware. All > > > >>> of > > > >>> the > > > >>> C2PMSG registers functionality is defined by the PSP > > > >>> firmware. > > > >>> They > > > >>> are basically scratch registers used to communicate between > > > >>> the > > > >>> driver > > > >>> and the PSP firmware. > > > >>> > > > >>>> > > > >>>>> On APUs, the PSP is shared with > > > >>>>> the CPU and the rest of the platform. The GPU driver just > > > >>>>> interacts > > > >>>>> with it for a few specific tasks: > > > >>>>> 1. Loading Trusted Applications (e.g., trusted firmware > > > >>>>> applications > > > >>>>> that run on the PSP for specific functionality, e.g., HDCP > > > >>>>> and > > > >>>>> content > > > >>>>> protection, etc.) > > > >>>>> 2. Validating and loading firmware for other engines on the > > > >>>>> SoC. > > > >>>>> This > > > >>>>> is required to use those engines. > > > >>>> Trying to understand in more details how we start the PSP > > > >>>> up, I > > > >>>> noticed > > > >>>> that psp_v12_0 has support for loading a sOS firmware, but > > > >>>> never > > > >>>> calls > > > >>>> init_sos_microcode() - and anyway there is no sos firmware > > > >>>> for > > > >>>> renoir > > > >>>> and green_sardine, which seem to be the only ASICs with this > > > >>>> PSP > > > >>>> version. > > > >>>> Is it something that's just not been completely wired up yet > > > >>>> ? > > > >>> On APUs, the PSP is shared with the CPU so the PSP firmware > > > >>> is > > > >>> part > > > >>> of > > > >>> the sbios image. The driver doesn't load it. We only load > > > >>> it on > > > >>> dGPUs where the driver is responsible for the chip > > > >>> initialization. > > > >>> > > > >>>> That also rings a bell, that we have nothing about Secure OS > > > >>>> in > > > >>>> the doc > > > >>>> yet (not even the acronym in the glossary). > > > >>>> > > > >>>> > > > >>>>> I'm not too familiar with the PSP's path to memory from the > > > >>>>> GPU > > > >>>>> perspective. IIRC, most memory used by the PSP goes > > > >>>>> through > > > >>>>> carve > > > >>>>> out > > > >>>>> "vram" on APUs so it should work, but I would double check > > > >>>>> if > > > >>>>> there > > > >>>>> are any system memory allocations that used to interact > > > >>>>> with > > > >>>>> the PSP > > > >>>>> and see if changing them to vram helps. It does work with > > > >>>>> the > > > >>>>> IOMMU > > > >>>>> enabled on bare metal, so it should work in passthrough as > > > >>>>> well > > > >>>>> in > > > >>>>> theory. > > > >>>> I can see a single case in the PSP code where GTT is used > > > >>>> instead > > > >>>> of > > > >>>> vram: to create fw_pri_bo when SR-IOV is not used (and there > > > >>>> has > > > >>>> to be a reason, since the SR-IOV code path does use vram). > > > >>>> Changing it to vram does not make a difference, but then the > > > >>>> only bo that seems to be used at that point is the one for > > > >>>> the > > > >>>> psp ring, > > > >>>> which is allocated in vram, so I'm not too much surprised. > > > >>>> > > > >>>> Maybe I should double-check bo_create calls to hunt for more > > > >>>> ? > > > >>> We looked into this a bit ourselves and ran into the same > > > >>> issues. > > > >>> We'd probably need to debug this with the PSP team to make > > > >>> further > > > >>> progress, but this was not productized so neither team had > > > >>> the > > > >>> resources to delve further. > > > >>> > > > >>> Alex > > > >>> > > > >>>> > > > >>>> [0] > > > >>>> https://github.com/PSPReverse/psp-docs/blob/master/masterthesis-eichner-psp-2020.pdf > > > > > > >