Am 23.11.22 um 10:30 schrieb Daniel Vetter:
On Wed, 23 Nov 2022 at 10:06, Christian König <christian.koenig@xxxxxxx> wrote:
Am 22.11.22 um 20:50 schrieb Daniel Vetter:
On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe <jgg@xxxxxxxx> wrote:
On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:
You nuke all the ptes. Drivers that move have slightly more than a
bare struct file, they also have a struct address_space so that
invalidate_mapping_range() works.
Okay, this is one of the ways that this can be made to work correctly,
as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
was the DAX mistake)
Hence this patch, to enforce that no dma-buf exporter gets this wrong.
Which some did, and then blamed bug reporters for the resulting splats
:-) One of the things we've reverted was the ttm huge pte support,
since that doesn't have the pmd_special flag (yet) and so would let
gup_fast through.
The problem is not only gup, a lot of people seem to assume that when
you are able to grab a reference to a page that the ptes pointing to
that page can't change any more. And that's obviously incorrect.
I witnessed tons of discussions about that already. Some customers even
modified our code assuming that and then wondered why the heck they ran
into data corruption.
It's gotten so bad that I've even proposed intentionally mangling the
page reference count on TTM allocated pages:
https://patchwork.kernel.org/project/dri-devel/patch/20220927143529.135689-1-christian.koenig@xxxxxxx/
Yeah maybe something like this could be applied after we land this
patch here.
I think both should land ASAP. We don't have any other way than to
clearly point out incorrect approaches if we want to prevent the
resulting data corruption.
Well maybe should have the same check in gem mmap code to
make sure no driver
Reads like the sentence is somehow cut of?
I think it would be better that instead of having special flags in the
ptes and vmas that you can't follow them to a page structure we would
add something to the page indicating that you can't grab a reference to
it. But this might break some use cases as well.
Afaik the problem with that is that there's no free page bits left for
these debug checks. Plus the pte+vma flags are the flags to make this
clear already.
Maybe a GFP flag to set the page reference count to zero or something
like this?
Christian.
-Daniel