On Mon, 2023-10-02 at 16:27 +0200, Thomas Hellström wrote: > !! External Email > > On Mon, 2023-10-02 at 10:16 +0200, Thomas Hellström wrote: > > Hi, Zack > > > > On 9/26/23 19:51, Zack Rusin wrote: > > > From: Zack Rusin <zackr@xxxxxxxxxx> > > > > > > Some drivers require the mapped tt pages to be decrypted. In an > > > ideal > > > world this would have been handled by the dma layer, but the TTM > > > page > > > fault handling would have to be rewritten to able to do that. > > > > > > A side-effect of the TTM page fault handling is using a dma > > > allocation > > > per order (via ttm_pool_alloc_page) which makes it impossible to > > > just > > > trivially use dma_mmap_attrs. As a result ttm has to be very > > > careful > > > about trying to make its pgprot for the mapped tt pages match what > > > the dma layer thinks it is. At the ttm layer it's possible to > > > deduce the requirement to have tt pages decrypted by checking > > > whether coherent dma allocations have been requested and the system > > > is running with confidential computing technologies. > > > > > > This approach isn't ideal but keeping TTM matching DMAs > > > expectations > > > for the page properties is in general fragile, unfortunately proper > > > fix would require a rewrite of TTM's page fault handling. > > > > > > Fixes vmwgfx with SEV enabled. > > > > > > v2: Explicitly include cc_platform.h > > > > > > Signed-off-by: Zack Rusin <zackr@xxxxxxxxxx> > > > Fixes: 3bf3710e3718 ("drm/ttm: Add a generic TTM memcpy move for > > > page-based iomem") > > > Cc: Christian König <christian.koenig@xxxxxxx> > > > Cc: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> > > > Cc: Huang Rui <ray.huang@xxxxxxx> > > > Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx > > > Cc: linux-kernel@xxxxxxxxxxxxxxx > > > Cc: <stable@xxxxxxxxxxxxxxx> # v5.14+ > > > --- > > > drivers/gpu/drm/ttm/ttm_bo_util.c | 13 +++++++++++-- > > > drivers/gpu/drm/ttm/ttm_tt.c | 8 ++++++++ > > > include/drm/ttm/ttm_tt.h | 9 ++++++++- > > > 3 files changed, 27 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c > > > b/drivers/gpu/drm/ttm/ttm_bo_util.c > > > index fd9fd3d15101..0b3f4267130c 100644 > > > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c > > > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c > > > @@ -294,7 +294,13 @@ pgprot_t ttm_io_prot(struct ttm_buffer_object > > > *bo, struct ttm_resource *res, > > > enum ttm_caching caching; > > > > > > man = ttm_manager_type(bo->bdev, res->mem_type); > > > - caching = man->use_tt ? bo->ttm->caching : res- > > > > bus.caching; > > > + if (man->use_tt) { > > > + caching = bo->ttm->caching; > > > + if (bo->ttm->page_flags & TTM_TT_FLAG_DECRYPTED) > > > + tmp = pgprot_decrypted(tmp); > > > + } else { > > > + caching = res->bus.caching; > > > + } > > > > > > return ttm_prot_from_caching(caching, tmp); > > > } > > > @@ -337,6 +343,8 @@ static int ttm_bo_kmap_ttm(struct > > > ttm_buffer_object *bo, > > > .no_wait_gpu = false > > > }; > > > struct ttm_tt *ttm = bo->ttm; > > > + struct ttm_resource_manager *man = > > > + ttm_manager_type(bo->bdev, bo->resource- > > > > mem_type); > > > pgprot_t prot; > > > int ret; > > > > > > @@ -346,7 +354,8 @@ static int ttm_bo_kmap_ttm(struct > > > ttm_buffer_object *bo, > > > if (ret) > > > return ret; > > > > > > - if (num_pages == 1 && ttm->caching == ttm_cached) { > > > + if (num_pages == 1 && ttm->caching == ttm_cached && > > > + !(man->use_tt && (ttm->page_flags & > > > TTM_TT_FLAG_DECRYPTED))) { > > > /* > > > * We're mapping a single page, and the desired > > > * page protection is consistent with the bo. > > > diff --git a/drivers/gpu/drm/ttm/ttm_tt.c > > > b/drivers/gpu/drm/ttm/ttm_tt.c > > > index e0a77671edd6..e4966e2c988d 100644 > > > --- a/drivers/gpu/drm/ttm/ttm_tt.c > > > +++ b/drivers/gpu/drm/ttm/ttm_tt.c > > > @@ -31,6 +31,7 @@ > > > > > > #define pr_fmt(fmt) "[TTM] " fmt > > > > > > +#include <linux/cc_platform.h> > > > #include <linux/sched.h> > > > #include <linux/shmem_fs.h> > > > #include <linux/file.h> > > > @@ -81,6 +82,13 @@ int ttm_tt_create(struct ttm_buffer_object *bo, > > > bool zero_alloc) > > > pr_err("Illegal buffer object type\n"); > > > return -EINVAL; > > > } > > > + /* > > > + * When using dma_alloc_coherent with memory encryption the > > > + * mapped TT pages need to be decrypted or otherwise the > > > drivers > > > + * will end up sending encrypted mem to the gpu. > > > + */ > > > + if (bdev->pool.use_dma_alloc && > > > cc_platform_has(CC_ATTR_MEM_ENCRYPT)) > > > > You need to use CC_ATTR_GUEST_MEM_ENCRYPT here rather than > > CC_ATTR_MEM_ENCRYPT to avoid touching and breaking the SME case and > > only > > fix the SEV / SEV-ES case. I'd also hold off the stable inclusion > > until > > it's completely verified that this doesn't break anything because if > > it > > does, I suspect all hell will break loose. > > > > With that said, for the functionality > > > > Reviewed-by: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> > > > > But I think this needs a wider Ack at the ttm / drm level for the > > approach taken. > > > > /Thomas. > > FWIW, I think that if TTM_TT_FLAG_DECRYPTED is set, it should be > possible to add a debug WARN_ON_ONCE() if the first PTE of the dma > page's kernel virtual address does not use a decrypted pgprot_t. One > way of accessing the PTEs in a platform-generic fashion is > apply_to_page_range(). Good point. Another, trivial solution to that problem of possible regression would simply be introducing: #define TTM_DEVICE_USE_DMA_ALLOC BIT(0) #define TTM_DEVICE_USE_GFP_DMA32 BIT(1) #define TTM_DEVICE_USE_DECRYPTED_SYS_MEM BIT(2) and changing ttm_device_init from: int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *funcs, struct device *dev, struct address_space *mapping, struct drm_vma_offset_manager *vma_manager, bool use_dma_alloc, bool use_dma32); to: int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *funcs, struct device *dev, struct address_space *mapping, struct drm_vma_offset_manager *vma_manager, u32 use_flags); The driver should have a lot clearer picture of whether TTM_DEVICE_USE_DECRYPTED_SYS_MEM should be used. That change requires porting the drivers to the new ttm_device_init (which is trivial) but guarantees no regressions simply by virture of having vmwgfx use TTM_DEVICE_USE_DECRYPTED_SYS_MEM only (at least initially, I imagine at least qxl would need it as well). Christian, any thoughts? z