On Wed, 23 Nov 2022 at 10:06, Christian König <christian.koenig@xxxxxxx> wrote: > Am 22.11.22 um 20:50 schrieb Daniel Vetter: > > On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe <jgg@xxxxxxxx> wrote: > >> On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote: > >>> You nuke all the ptes. Drivers that move have slightly more than a > >>> bare struct file, they also have a struct address_space so that > >>> invalidate_mapping_range() works. > >> Okay, this is one of the ways that this can be made to work correctly, > >> as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this > >> was the DAX mistake) > > Hence this patch, to enforce that no dma-buf exporter gets this wrong. > > Which some did, and then blamed bug reporters for the resulting splats > > :-) One of the things we've reverted was the ttm huge pte support, > > since that doesn't have the pmd_special flag (yet) and so would let > > gup_fast through. > > The problem is not only gup, a lot of people seem to assume that when > you are able to grab a reference to a page that the ptes pointing to > that page can't change any more. And that's obviously incorrect. > > I witnessed tons of discussions about that already. Some customers even > modified our code assuming that and then wondered why the heck they ran > into data corruption. > > It's gotten so bad that I've even proposed intentionally mangling the > page reference count on TTM allocated pages: > https://patchwork.kernel.org/project/dri-devel/patch/20220927143529.135689-1-christian.koenig@xxxxxxx/ Yeah maybe something like this could be applied after we land this patch here. Well maybe should have the same check in gem mmap code to make sure no driver > I think it would be better that instead of having special flags in the > ptes and vmas that you can't follow them to a page structure we would > add something to the page indicating that you can't grab a reference to > it. But this might break some use cases as well. Afaik the problem with that is that there's no free page bits left for these debug checks. Plus the pte+vma flags are the flags to make this clear already. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch