On Mon, Apr 17, 2023 at 03:00:16PM +0100, Lorenzo Stoakes wrote: > On Mon, Apr 17, 2023 at 10:26:09AM -0300, Jason Gunthorpe wrote: > > On Mon, Apr 17, 2023 at 02:19:16PM +0100, Lorenzo Stoakes wrote: > > > > > > I'd rather see something like FOLL_ALLOW_BROKEN_FILE_MAPPINGS than > > > > io_uring open coding this kind of stuff. > > > > > > > > > > How would the semantics of this work? What is broken? It is a little > > > frustrating that we have FOLL_ANON but hugetlb as an outlying case, adding > > > FOLL_ANON_OR_HUGETLB was another consideration... > > > > It says "historically this user has accepted file backed pages and we > > we think there may actually be users doing that, so don't break the > > uABI" > > Having written a bunch here I suddenly realised that you probably mean for > this flag to NOT be applied to the io_uring code and thus have it enforce > the 'anonymous or hugetlb' check by default? Yes > So you mean to disallow file-backed page pinning as a whole unless this > flag is specified? Yes > For FOLL_GET I can see that access to the underlying > data is dangerous as the memory may get reclaimed or migrated, but surely > DMA-pinned memory (as is the case here) is safe? No, it is all broken, read-only access is safe. We are trying to get a point where pin access will interact properly with the filesystem, but it isn't done yet. > Or is this a product more so of some kernel process accessing file-backed > pages for a file system which expects write-notify semantics and doesn't > get them in this case, which could indeed be horribly broken. Yes, broadly > I am definitely in favour of cutting things down if possible, and very much > prefer the use of uaccess if we are able to do so rather than GUP. > > I do feel that GUP should be focused purely on pinning memory rather than > manipulating it (whether read or write) so I agree with this sentiment. Yes, someone needs to be brave enough to go and try to adjust these old places :) I see in the git history this was added to solve CVE-2018-1120 - eg FUSE can hold off fault-in indefinitely. So the flag is really badly misnamed - it is "FOLL_DONT_BLOCK_ON_USERSPACE" and anon memory is a simple, but overly narrow, way to get that property. If it is changed to use kthread_use_mm() it needs a VMA based check for the same idea. Jason