> -----Original Message----- > From: Intel-xe <intel-xe-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Mrozek, > Michal > Sent: Tuesday, November 19, 2024 6:12 PM > To: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>; Christian König > <christian.koenig@xxxxxxx>; Brost, Matthew <matthew.brost@xxxxxxxxx>; > dri-devel@xxxxxxxxxxxxxxxxxxxxx; intel-xe@xxxxxxxxxxxxxxxxxxxxx > Cc: Graunke, Kenneth W <kenneth.w.graunke@xxxxxxxxx>; Landwerlin, Lionel > G <lionel.g.landwerlin@xxxxxxxxx>; Souza, Jose <jose.souza@xxxxxxxxx>; > simona.vetter@xxxxxxxx; thomas.hellstrom@xxxxxxxxxxxxxxx; > boris.brezillon@xxxxxxxxxxxxx; airlied@xxxxxxxxx; > mihail.atanassov@xxxxxxx; steven.price@xxxxxxx; > shashank.sharma@xxxxxxx > Subject: RE: [RFC PATCH 13/29] drm/xe/mmap: Add mmap support for PCI > memory barrier > > "Adding Michal from the compute userspace team for sharing references to > the code. > > Quoting Christian König (2024-11-19 12:00:44) > > Am 19.11.24 um 00:37 schrieb Matthew Brost: > > > From: Tejas Upadhyay <tejas.upadhyay@xxxxxxxxx> > > > > > > In order to avoid having userspace to use MI_MEM_FENCE, we are > > > adding a mechanism for userspace to generate a PCI memory barrier > > > with low overhead (avoiding IOCTL call as well as writing to VRAM > > > will adds some overhead). > > > > > > This is implemented by memory-mapping a page as uncached that is > > > backed by MMIO on the dGPU and thus allowing userspace to do memory > > > write to the page without invoking an IOCTL. > > > We are selecting the MMIO so that it is not accessible from the PCI > > > bus so that the MMIO writes themselves are ignored, but the PCI > > > memory barrier will still take action as the MMIO filtering will > > > happen after the memory barrier effect. > > > > > > When we detect special defined offset in mmap(), We are mapping 4K > > > page which contains the last of page of doorbell MMIO range to > > > userspace for same purpose. > > > > Well that is quite a hack, but don't you still need a memory barrier > > instruction? E.g. m_fence? > > I guess you refer on the userspace usage directions? Yeah, the userspace > definitely has to make sure that the write actually propagated to the PCI bus > before they can assume the serialization to happen on the GPU. I think the > userspace folks should be able to explain how exactly the orchestrate that. > Michal, can you or somebody else share the respective lines of code in the > userspace driver? > > At this time, the userspace only enables this on X86, but could also support > other more exotic platforms via libpciaccess. > > > And why don't you expose the real doorbell instead of the last > > (unused?) page of the MMIO region? > > Doorbells are a complete red herring here. > > Chosen page just happens to be a full 4K MMIO page where any writes > coming over PCI bus get dropped (and reads return zero) by the GPU. Such > dummy (from CPU point of view) 4K MMIO page allows doing a CPU write > that generates a PCI bus transaction, where the transaction itself is essentially > a NOP. But as the transaction falls into the MMIO address range, it will trigger a > serialization of the incoming traffic in the GPU side, before being ignored. > > Regards, Joonas > " > > Here is appropriate path: > https://github.com/intel/compute- > runtime/blob/f589408848128434e410b6b4c2a9107ff78a74e9/shared/sou > rce/direct_submission/direct_submission_hw.inl#L437 > > flow is as follows: > 1. do updates to shared memory between CPU/GPU using WC memory > mapping 2. emit sfence instruction to make sure there is no reordering on the > CPU side 3. emit pciBarrier write (this patch) , this ensures that all earlier > transactions are properly ordered from the GPU side > > So PCI memory barrier is submitted after sfence instruction and that makes > sure that all earlier transactions are properly ordered. > > Michal https://patchwork.freedesktop.org/patch/629628/ is separate reviewed submission intended for merge standalone. It will be merged if there are no objections. Thanks, Tejas >