Re: [RFC PATCH 2/2] drm/ttm: downgrade cached to write_combined when snooping not available

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



在 2024-07-01星期一的 13:40 +0200,Christian König写道:
> Am 29.06.24 um 22:51 schrieb Icenowy Zheng:
> > 
> > 于 2024年6月30日 GMT+08:00 03:57:47,Jiaxun Yang
> > <jiaxun.yang@xxxxxxxxxxx> 写道:
> > > 
> > > 在2024年6月29日六月 上午6:22,Icenowy Zheng写道:
> > > [...]
> > > > @@ -302,6 +302,10 @@ pgprot_t ttm_io_prot(struct
> > > > ttm_buffer_object *bo,
> > > > struct ttm_resource *res,
> > > >                 caching = res->bus.caching;
> > > >         }
> > > > 
> > > > +       /* Downgrade cached mapping for non-snooping devices */
> > > > +       if (!bo->bdev->dma_coherent && caching == ttm_cached)
> > > > +               caching = ttm_write_combined;
> > > Hi Icenowy,
> > > 
> > > Thanks for your patch! You saved many non-coh PCIe host
> > > implementations a day!.
> 
> Ah, wait a second.
> 
> Such a thing as non-coherent PCIe implementation doesn't exist. The
> PCIe 
> specification makes it mandatory for memory access to be cache
> coherent.

Really? I tried to get PCIe spec 2.0, PCI spec 3.0 and PCI-X addendum
1.0, none of this explicitly requires the PCIe controller and the CPU
being fully coherent. The PCI-X spec even says "Note that PCI-X, like
conventional PCI, does not require systems to support coherent caches
for addresses accessed by PCI-X requesters".

In addition, in the perspective of Linux, I think bypassing CPU cache
of shared memory is considered as coherent access too, see
dma_alloc_coherent() function's naming.

> 
> There are a bunch of non-compliant PCIe implementations which have 
> broken cache coherency, but those explicitly violate the
> specification 
> and because of that are not supported.

Regardless of it violating the spec or not, these devices work with
Linux subsystems that use dma_alloc_coherent to allocate DMA buffers
(which is the most common case), and GPU drivers just give out cryptic
error messages like "ring gfx test failed" without any mention of
coherency issues at all, which makes the fact that Linux DRM/TTM
subsystem currently requires PCIe snooping CPU cache more obscure.

> 
> Regards,
> Christian.
> 
> > > 
> > > Unfortunately I don't think we can safely ttm_cached to
> > > ttm_write_comnined, we've
> > > had enough drama with write combine behaviour on all different
> > > platforms.
> > > 
> > > See drm_arch_can_wc_memory in drm_cache.h.
> > > 
> > Yes this really sounds like an issue.
> > 
> > Maybe the behavior of ttm_write_combined should furtherly be
> > decided
> > by drm_arch_can_wc_memory() in case of quirks?
> > 
> > > Thanks
> > > 
> > > > +
> > > >         return ttm_prot_from_caching(caching, tmp);
> > > >   }
> > > >   EXPORT_SYMBOL(ttm_io_prot);
> > > > diff --git a/drivers/gpu/drm/ttm/ttm_tt.c
> > > > b/drivers/gpu/drm/ttm/ttm_tt.c
> > > > index 7b00ddf0ce49f..3335df45fba5e 100644
> > > > --- a/drivers/gpu/drm/ttm/ttm_tt.c
> > > > +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> > > > @@ -152,6 +152,10 @@ static void ttm_tt_init_fields(struct
> > > > ttm_tt *ttm,
> > > >                                enum ttm_caching caching,
> > > >                                unsigned long extra_pages)
> > > >   {
> > > > +       /* Downgrade cached mapping for non-snooping devices */
> > > > +       if (!bo->bdev->dma_coherent && caching == ttm_cached)
> > > > +               caching = ttm_write_combined;
> > > > +
> > > >         ttm->num_pages = (PAGE_ALIGN(bo->base.size) >>
> > > > PAGE_SHIFT) + extra_pages;
> > > >         ttm->page_flags = page_flags;
> > > >         ttm->dma_address = NULL;
> > > > diff --git a/include/drm/ttm/ttm_caching.h
> > > > b/include/drm/ttm/ttm_caching.h
> > > > index a18f43e93abab..f92d7911f50e4 100644
> > > > --- a/include/drm/ttm/ttm_caching.h
> > > > +++ b/include/drm/ttm/ttm_caching.h
> > > > @@ -47,7 +47,8 @@ enum ttm_caching {
> > > > 
> > > >         /**
> > > >          * @ttm_cached: Fully cached like normal system memory,
> > > > requires that
> > > > -        * devices snoop the CPU cache on accesses.
> > > > +        * devices snoop the CPU cache on accesses. Downgraded
> > > > to
> > > > +        * ttm_write_combined when the snooping capaiblity is
> > > > missing.
> > > >          */
> > > >         ttm_cached
> > > >   };
> > > > -- 
> > > > 2.45.2
> 





[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux