在 2024-07-02星期二的 10:10 +0200,Christian König写道: > Am 02.07.24 um 10:08 schrieb Christian König: > > Am 02.07.24 um 03:46 schrieb Icenowy Zheng: > > > 在 2024-07-01星期一的 13:40 +0200,Christian König写道: > > > > Am 29.06.24 um 22:51 schrieb Icenowy Zheng: > > > > > 于 2024年6月30日 GMT+08:00 03:57:47,Jiaxun Yang > > > > > <jiaxun.yang@xxxxxxxxxxx> 写道: > > > > > > 在2024年6月29日六月 上午6:22,Icenowy Zheng写道: > > > > > > [...] > > > > > > > @@ -302,6 +302,10 @@ pgprot_t ttm_io_prot(struct > > > > > > > ttm_buffer_object *bo, > > > > > > > struct ttm_resource *res, > > > > > > > caching = res->bus.caching; > > > > > > > } > > > > > > > > > > > > > > + /* Downgrade cached mapping for non-snooping > > > > > > > devices */ > > > > > > > + if (!bo->bdev->dma_coherent && caching == > > > > > > > ttm_cached) > > > > > > > + caching = ttm_write_combined; > > > > > > Hi Icenowy, > > > > > > > > > > > > Thanks for your patch! You saved many non-coh PCIe host > > > > > > implementations a day!. > > > > Ah, wait a second. > > > > > > > > Such a thing as non-coherent PCIe implementation doesn't exist. > > > > The > > > > PCIe > > > > specification makes it mandatory for memory access to be cache > > > > coherent. > > > Really? I tried to get PCIe spec 2.0, PCI spec 3.0 and PCI-X > > > addendum > > > 1.0, none of this explicitly requires the PCIe controller and the > > > CPU > > > being fully coherent. The PCI-X spec even says "Note that PCI-X, > > > like > > > conventional PCI, does not require systems to support coherent > > > caches > > > for addresses accessed by PCI-X requesters". > > > > See the very first PCI specification, AGP 2.0 and the PCIe > > extension > > for non-snooped access. > > > > Originally it wasn't well defined what the PCI 1.0 spec meant with > > coherency (e.g. snooping vs uncached). > > > > AGP was the first specification which explicitly defined that all > > AGP > > memory accesses must be non-snooped and all PCI accesses must snoop > > the CPU caches. > > > > PCIe then had an extension which defined the "No Snooping > > Attribute" > > to allow emulating the AGP behavior. > > > > For the current PCIe 6.1 specification the non-snoop extension was > > merged into the base specification. > > > > Here see section "2.2.6.5 No Snoop Attribute", e.g. "Hardware > > enforced > > cache coherency expected" > > > > As well as the notes under section 7.5.3.4 Device Control Register: > > > > Enable No Snoop - If this bit is Set, the Function is permitted to > > Set > > the No Snoop bit in the Requester > > Attributes of transactions it initiates that do not require > > hardware > > enforced cache coherency (see Section 2.2.6.5 ). > > > > To summarize it: Not snooping caches is an extension, snooping > > caches > > is mandatory. > > > > > In addition, in the perspective of Linux, I think bypassing CPU > > > cache > > > of shared memory is considered as coherent access too, see > > > dma_alloc_coherent() function's naming. > > > > Yes that's correct, but this is for platform devices. E.g. other > > I/O > > from drivers who doesn't need to work with malloced system memory > > for > > example. > > > > We have quite a bunch of V4L, sound and I also think network > > devices > > which work like that. But those are non-PCI devices. > > > > > > There are a bunch of non-compliant PCIe implementations which > > > > have > > > > broken cache coherency, but those explicitly violate the > > > > specification > > > > and because of that are not supported. > > > Regardless of it violating the spec or not, these devices work > > > with > > > Linux subsystems that use dma_alloc_coherent to allocate DMA > > > buffers > > > (which is the most common case), and GPU drivers just give out > > > cryptic > > > error messages like "ring gfx test failed" without any mention of > > > coherency issues at all, which makes the fact that Linux DRM/TTM > > > subsystem currently requires PCIe snooping CPU cache more > > > obscure. > > > > No, they don't even remotely work. You just got very basic tests > > working. > > > > Both the Vulkan as well as the OpenGL specification require that > > you > > can import "normal" malloced() system memory into the GPU driver. > > > > This is not possible without a cache coherent platform > > architecture. > > So you can't fully support those APIs. > > > > We exercised this quite extensively already and even have a > > confirmation from ARM engineers that the approach of attaching just > > any PCIe root to an ARM IP core is not supported from their side. > > > > And if I'm not completely mistaken the RISC-V specification was > > also > > updated to disallow stuff like this. > > Googling helps: > https://github.com/riscv/riscv-platform-specs/issues/83 riscv-platform-specs is a deprecated repository, and the specs are never ratified by RVI. According to its content, it should be moved to riscv-non-isa/, but the move didn't even happen. The successor of this specific part, which is reduced to "server" context, is placed at riscv-non-isa/server-soc, in a more specific form; however server-soc is also not ratified yet either, and its CCS_040 item uses "SHOULD" instead of "MUST" on "hardware enforced cache coherency". > > Regards, > Christian. > > > > > So yes you can have boards which implement non-snooped PCIe, but > > you > > get exactly zero support from hardware vendors to run software on > > it. > > > > Regards, > > Christian. > > > > > > Regards, > > > > Christian. > > > > > > > > > > Unfortunately I don't think we can safely ttm_cached to > > > > > > ttm_write_comnined, we've > > > > > > had enough drama with write combine behaviour on all > > > > > > different > > > > > > platforms. > > > > > > > > > > > > See drm_arch_can_wc_memory in drm_cache.h. > > > > > > > > > > > Yes this really sounds like an issue. > > > > > > > > > > Maybe the behavior of ttm_write_combined should furtherly be > > > > > decided > > > > > by drm_arch_can_wc_memory() in case of quirks? > > > > > > > > > > > Thanks > > > > > > > > > > > > > + > > > > > > > return ttm_prot_from_caching(caching, tmp); > > > > > > > } > > > > > > > EXPORT_SYMBOL(ttm_io_prot); > > > > > > > diff --git a/drivers/gpu/drm/ttm/ttm_tt.c > > > > > > > b/drivers/gpu/drm/ttm/ttm_tt.c > > > > > > > index 7b00ddf0ce49f..3335df45fba5e 100644 > > > > > > > --- a/drivers/gpu/drm/ttm/ttm_tt.c > > > > > > > +++ b/drivers/gpu/drm/ttm/ttm_tt.c > > > > > > > @@ -152,6 +152,10 @@ static void > > > > > > > ttm_tt_init_fields(struct > > > > > > > ttm_tt *ttm, > > > > > > > enum ttm_caching caching, > > > > > > > unsigned long > > > > > > > extra_pages) > > > > > > > { > > > > > > > + /* Downgrade cached mapping for non-snooping > > > > > > > devices */ > > > > > > > + if (!bo->bdev->dma_coherent && caching == > > > > > > > ttm_cached) > > > > > > > + caching = ttm_write_combined; > > > > > > > + > > > > > > > ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> > > > > > > > PAGE_SHIFT) + extra_pages; > > > > > > > ttm->page_flags = page_flags; > > > > > > > ttm->dma_address = NULL; > > > > > > > diff --git a/include/drm/ttm/ttm_caching.h > > > > > > > b/include/drm/ttm/ttm_caching.h > > > > > > > index a18f43e93abab..f92d7911f50e4 100644 > > > > > > > --- a/include/drm/ttm/ttm_caching.h > > > > > > > +++ b/include/drm/ttm/ttm_caching.h > > > > > > > @@ -47,7 +47,8 @@ enum ttm_caching { > > > > > > > > > > > > > > /** > > > > > > > * @ttm_cached: Fully cached like normal system > > > > > > > memory, > > > > > > > requires that > > > > > > > - * devices snoop the CPU cache on accesses. > > > > > > > + * devices snoop the CPU cache on accesses. > > > > > > > Downgraded > > > > > > > to > > > > > > > + * ttm_write_combined when the snooping > > > > > > > capaiblity is > > > > > > > missing. > > > > > > > */ > > > > > > > ttm_cached > > > > > > > }; > > > > > > > -- > > > > > > > 2.45.2 > >