Re: Decrypting tt maps in ttm

Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> · Wed, 20 Sep 2023 21:22:46 +0200

On 9/20/23 20:24, Zack Rusin wrote:
On Wed, 2023-09-20 at 19:17 +0200, Thomas Hellström wrote:
!! External Email

Hi, Zack

On 9/20/23 18:39, Zack Rusin wrote:
On Wed, 2023-09-20 at 12:48 +0200, Christian König wrote:
!! External Email

Am 20.09.23 um 09:36 schrieb Thomas Hellström:
Hi, Zack,

On 9/20/23 05:43, Zack Rusin wrote:
On Tue, 2023-09-19 at 09:47 +0200, Christian König wrote:
!! External Email

Am 19.09.23 um 08:56 schrieb Thomas Hellström:
On 9/19/23 07:39, Christian König wrote:
Am 19.09.23 um 03:26 schrieb Zack Rusin:
On Mon, 2023-09-18 at 16:21 -0400, Alex Deucher wrote:
!! External Email

On Mon, Sep 18, 2023 at 3:06 PM Thomas Hellström
<thomas.hellstrom@xxxxxxxxxxxxxxx> wrote:
On 9/18/23 17:52, Zack Rusin wrote:
On Mon, 2023-09-18 at 17:13 +0200, Thomas Hellström wrote:
Hi,

On 9/18/23 16:56, Thomas Hellström wrote:
Hi Zack, Christian

On 9/18/23 13:36, Christian König wrote:
Hi Zack,

adding Thomas and Daniel.

I briefly remember that I talked with Thomas and some
other
people
about that quite a while ago as well, but I don't
fully
remember the
outcome.
Found one old thread, but didn't read it:

https://lists.freedesktop.org/archives/dri-devel/2019-September/234100.html

/Thomas

Ugh. Now starting to read that thread I have a vague
recollection it all
ended with not supporting mapping any device pages
whatsoever
when SEV
was enabled, but rather resorting to llvmpipe and VM-local
bos.
Hi, Thomas.

Thanks for finding this! I'd (of course) like to solve it
properly and get
vmwgfx
running with 3d support with SEV-ES active instead of
essentially
disabling
the
driver when SEV-ES is active.

I think there are two separate discussions there, the
non-controversial one
and the
controversial one:
1) The non-controversial: is there a case where drivers
would
want encrypted
memory
for TT pages but not for io mem mappings? Because if not
then as
Christian
pointed
out we could just add pgprot_decrypted to ttm_io_prot and be
essentially done.
The
current method of decrypting io mem but leaving sys mem
mappings
encrypted is
a bit
weird anyway.

If the answer to that question is "yes, some driver does
want
the
TT mappings
to be
encrypted" then your "[PATCH v2 3/4] drm/ttm, drm/vmwgfx:
Correctly support
support
AMD memory encryption" solves that. I think getting one of
those
two in makes
sense
regardless of everything else, agreed?
Well, there is more to it I think.

IIRC, the AMD SME encryption mode has a way for a device to
have the
memory controller (?) encrypt / decrypt device traffic by
using an
address range alias, so in theory it supports encrypted TT
pages, and
the dma-layer may indeed hand encrypted DMA pages to TTM on
such
systems
depending on the device's DMA mask. That's why I think that
force_dma_unencrypted() export was needed, and If the amdgpu
driver
accesses TT memory in SME mode *without* pgprot_decrypted()
and it
still
works, then I think that mode is actually used. How could it
otherwise work?
For SME, as long as the encrypted bit is set in the physical
address
used for DMA, the memory controller will handle the
encrypt/decrypt
for the device.  For devices with a limited dma mask, you need
to use
the IOMMU so that the encrypted bit is retained when the address
hits
the memory controller.
How does that work on systems with swiotlb, e.g. swiotlb=force, or
i.e. what would
decrypt the ttm tt mappings when copying between system and vram
when iommu is
disabled/absent?
SME makes it mandatory that all devices can handle the physical
address used for DMA, either native or with the help of IOMMU.

Hacks like SWIOTLB are not directly supported as far as I know.
Maybe
somehow SWIOTLB manually decrypts the data while copying it or
something like this, but I'm not 100% sure if that is actually
implemented.

Regards,
Christian.
A bold guess after looking at various code and patches:

1) Devices under SME that don't support the encryption bit and SEV:
a) Coherent memory is unencrypted.
b) Streaming DMA under IOMMU: The IOMMU sets the encrypted bit.
c) Streaming DMA with SWIOTLB: The bounce buffer is unencrypted.
Copying to/from bounce-buffer decrypts/encrypts.

2) Devices under SME that do support the encryption bit (which I
believe is most graphics devices in general on SME systems, not just
amdgpu; it "just works")
*) Coherent memory is encrypted. The DMA layer sets dma addresses and
pgprot accordingly.
*) Streaming DMA is encrypted.

So the bug in TTM would then be it's not handling 1a) and 1b)
correctly.

Remedy:
1b) Shouldn't be used with encryption.
1a) This is what we should try to fix. Exporting
dma_force_unencrypted() didn't seem to be a way forward. Properly
fixing this would, I guess, mean implement the missing functionality
in the dma layer: For vmap / kmap we could simply reuse the virtual
addresses we get back from dma_alloc_coherent(), but for faulting one
would want something like dma_coherent_insert_pfn() (if it doesn't
exist already) after a proper disussion with Christoph Hellwig.
Christoph once pointed me to dma_mmap_attrs() for this, but I never
found the time to fully look into it.
Hmm, yea, that would make sense
https://elixir.bootlin.com/linux/latest/source/kernel/dma/direct.c#L564
Replacing the vmap's with dma_mmap_attrs would probably fix this, but
it would
require a bit of extra setup.

So we're saying that yes, we don't want unconditional pgprot_decrypt
in ttm_io_prot.
We'd like to leave those tt mappings as encrypted when possible and
instead maybe
add a vaddr to ttm_tt (or extract it from the pages->private via the
ttm_pool_dma,
but that seems rather ugly),
It could probably be extracted from pages->private from a helper in
the ttm pool code, (Christian has a final saying here). However, that
requires that all ttm_tts are built from a single dma_alloc chunk. Not
sure that's the case? In that case we're back to square zero for vmaps.
Nope they aren't and yes we are back to square one with that.
Well, that's my favorite square. Number one, just like me...

Maybe we're overthinking this particular problem a bit. As is use_dma_alloc in
ttm
is only set in two cases:
- driver explicitly wants coherent mappings (vmwgfx, which require decrypted
pages)
- driver needs swiotlb (which, as was pointed out, would require the pages to be
decrypted as well)

So use_dma_alloc always requires the pages to be decrypted.
IIRC moving forward it doesn't, since there is (or at least there was)
implement missing TTM functionality in the dma layer and most TTM
drivers should at least support dma coherent memory. That means all
devices supporting a sufficiently large dma mask will break with SME and
your proposal then.

Perhaps if we condition that on
"cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)" that will capture all the
SEV cases, and limit the existing bug to the hopefully very few TTM
devices with limited dma mask on SME.
Ah, I wasn't aware those exist, do you know what platforms are those? I can try to
find one around here to see.

My understanding is cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) will 
return true in the guest iff SEV is active, and can be used in TTM as a 
poor man's force_dma_unencrypted(), enabling the functionality in your 
diff. It looks like a similar check is present in vmwgfx to detect SEV, 
but also see below.

And they don't really break, they just might unnecessarily decrypt tt pages, right?

No, with SME, dma from hw will encrypt the content, because the dma 
layer will set the "encrypt" bit in the physical address given to the 
iommu or the device in case iommu is not active, but a subsequent 
reading the content using the CPU won't decrypt so CPU and device will 
have different views of the page.

Also the linear kernel mapping PTEs will conflict in encryption mode 
with the ones TTM sets up, and IIRC that's forbidden in the SEV spec. 
(The x86 arch code goes through some serious work to flush out caches 
and TLBs to convert a page kernel linear mapping from encrypted to 
non-encrypted,

https://elixir.bootlin.com/linux/latest/source/arch/x86/mm/pat/set_memory.c#L2129

and that is also seen as pretty heavy dma_alloc_coherent() latency).

So the pgprot_t TTM sets up *must* be identical to the one used by the 
dma layer, so anything we should be aware here that anything we do in 
TTM less than adding needed functionality in the dma layer is 
second-guessing what the dma layer does internally and is not really the 
right solution.

/Thomas

In that case I'd think that we'd lean towards "unnecessarily decrypted" and not the
current "complete broken" (the memory is encrypted and the hardware can't handle
it). Although if we're actively thinking about rewriting dma/page fault handling in
ttm, then yes, this doesn't make sense. Otherwise it'd seem like a decent solution
to not being able to even present with SEV.

z