On Thu, Aug 19, 2021 at 10:14 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Thu, Aug 19, 2021 at 12:41 PM Andreas Gruenbacher > <agruenba@xxxxxxxxxx> wrote: > > > > Hmm, what if GUP is made to skip VM_IO vmas without adding anything to > > the pages array? That would match fault_in_iov_iter_writeable, which > > is modeled after __mm_populate and which skips VM_IO and VM_PFNMAP > > vmas. > > I don't understand what you mean.. GUP already skips VM_IO (and > VM_PFNMAP) pages. It just returns EFAULT. > > We could make it return another error. We already have DAX and > FOLL_LONGTERM returning -EOPNOTSUPP. > > Of course, I think some code ends up always just returning "number of > pages looked up" and might return 0 for "no pages" rather than the > error for the first page. > > So we may end up having interfaces that then lose that explanation > error code, but I didn't check. > > But we couldn't make it just say "skip them and try later addresses", > if that is what you meant. THAT makes no sense - that would just make > GUP look up some other address than what was asked for. get_user_pages has a start and a nr_pages argument, which specifies an address range from start to start + nr_pages * PAGE_SIZE. If pages != NULL, it adds a pointer to that array for each PAGE_SIZE subpage. I was thinking of skipping over VM_IO vmas in that process, so when the range starts in a mappable vma, runs into a VM_IO vma, and ends in a mappable vma, the pages in the pages array would be discontiguous; they would only cover the mappable vmas. But that would make it difficult to make sense of what's in the pages array. So scratch that idea. > > > I also do still think that even regardless of that, we want to just > > > add a FOLL_NOFAULT flag that just disables calling handle_mm_fault(), > > > and then you can use the regular get_user_pages(). > > > > > > That at least gives us the full _normal_ page handling stuff. > > > > And it does fix the generic/208 failure. > > Good. So I think the approach is usable, even if we might have corner > cases left. > > So I think the remaining issue is exactly things like VM_IO and > VM_PFNMAP. Do the fstests have test-cases for things like this? It > _is_ quite specialized, it might be a good idea to have that. > > Of course, doing direct-IO from special memory regions with zerocopy > might be something special people actually want to do. But I think > we've had that VM_IO flag testing there basically forever, so I don't > think it has ever worked (for some definition of "ever"). The v6 patch queue should handle those cases acceptably well for now, but I don't think we have tests covering that at all. Thanks, Andreas