On Tue, May 02, 2023 at 07:18:21PM -0700, Andrew Morton wrote: > On Tue, 2 May 2023 23:51:35 +0100 Lorenzo Stoakes <lstoakes@xxxxxxxxx> wrote: > > > Writing to file-backed dirty-tracked mappings via GUP is inherently broken > > as we cannot rule out folios being cleaned and then a GUP user writing to > > them again and possibly marking them dirty unexpectedly. > > > > This is especially egregious for long-term mappings (as indicated by the > > use of the FOLL_LONGTERM flag), so we disallow this case in GUP-fast as > > we have already done in the slow path. > > > > We have access to less information in the fast path as we cannot examine > > the VMA containing the mapping, however we can determine whether the folio > > is anonymous or belonging to a whitelisted filesystem - specifically > > hugetlb and shmem mappings. > > > > We take special care to ensure that both the folio and mapping are safe to > > access when performing these checks and document folio_fast_pin_allowed() > > accordingly. > > > > It's important to note that there are no APIs allowing users to specify > > FOLL_FAST_ONLY for a PUP-fast let alone with FOLL_LONGTERM, so we can > > always rely on the fact that if we fail to pin on the fast path, the code > > will fall back to the slow path which can perform the more thorough check. > > arm allnoconfig said > > mm/gup.c:115:13: warning: 'folio_fast_pin_allowed' defined but not used [-Wunused-function] > 115 | static bool folio_fast_pin_allowed(struct folio *folio, unsigned int flags) > | ^~~~~~~~~~~~~~~~~~~~~~ > > so I moved the definition inside CONFIG_ARCH_HAS_PTE_SPECIAL. > > > > mm/gup.c | 154 ++++++++++++++++++++++++++--------------------------- > 1 file changed, 77 insertions(+), 77 deletions(-) > > --- a/mm/gup.c~mm-gup-disallow-foll_longterm-gup-fast-writing-to-file-backed-mappings-fix > +++ a/mm/gup.c > @@ -96,83 +96,6 @@ retry: > return folio; > } > > -/* > - * Used in the GUP-fast path to determine whether a pin is permitted for a > - * specific folio. > - * > - * This call assumes the caller has pinned the folio, that the lowest page table > - * level still points to this folio, and that interrupts have been disabled. > - * > - * Writing to pinned file-backed dirty tracked folios is inherently problematic > - * (see comment describing the writable_file_mapping_allowed() function). We > - * therefore try to avoid the most egregious case of a long-term mapping doing > - * so. > - * > - * This function cannot be as thorough as that one as the VMA is not available > - * in the fast path, so instead we whitelist known good cases and if in doubt, > - * fall back to the slow path. > - */ > -static bool folio_fast_pin_allowed(struct folio *folio, unsigned int flags) > -{ > - struct address_space *mapping; > - unsigned long mapping_flags; > - > - /* > - * If we aren't pinning then no problematic write can occur. A long term > - * pin is the most egregious case so this is the one we disallow. > - */ > - if ((flags & (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE)) != > - (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE)) > - return true; > - > - /* The folio is pinned, so we can safely access folio fields. */ > - > - /* Neither of these should be possible, but check to be sure. */ > - if (unlikely(folio_test_slab(folio) || folio_test_swapcache(folio))) > - return false; > - > - /* hugetlb mappings do not require dirty-tracking. */ > - if (folio_test_hugetlb(folio)) > - return true; > - > - /* > - * GUP-fast disables IRQs. When IRQS are disabled, RCU grace periods > - * cannot proceed, which means no actions performed under RCU can > - * proceed either. > - * > - * inodes and thus their mappings are freed under RCU, which means the > - * mapping cannot be freed beneath us and thus we can safely dereference > - * it. > - */ > - lockdep_assert_irqs_disabled(); > - > - /* > - * However, there may be operations which _alter_ the mapping, so ensure > - * we read it once and only once. > - */ > - mapping = READ_ONCE(folio->mapping); > - > - /* > - * The mapping may have been truncated, in any case we cannot determine > - * if this mapping is safe - fall back to slow path to determine how to > - * proceed. > - */ > - if (!mapping) > - return false; > - > - /* Anonymous folios are fine, other non-file backed cases are not. */ > - mapping_flags = (unsigned long)mapping & PAGE_MAPPING_FLAGS; > - if (mapping_flags) > - return mapping_flags == PAGE_MAPPING_ANON; > - > - /* > - * At this point, we know the mapping is non-null and points to an > - * address_space object. The only remaining whitelisted file system is > - * shmem. > - */ > - return shmem_mapping(mapping); > -} > - > /** > * try_grab_folio() - Attempt to get or pin a folio. > * @page: pointer to page to be grabbed > @@ -2474,6 +2397,83 @@ static void __maybe_unused undo_dev_page > > #ifdef CONFIG_ARCH_HAS_PTE_SPECIAL > /* > + * Used in the GUP-fast path to determine whether a pin is permitted for a > + * specific folio. > + * > + * This call assumes the caller has pinned the folio, that the lowest page table > + * level still points to this folio, and that interrupts have been disabled. > + * > + * Writing to pinned file-backed dirty tracked folios is inherently problematic > + * (see comment describing the writable_file_mapping_allowed() function). We > + * therefore try to avoid the most egregious case of a long-term mapping doing > + * so. > + * > + * This function cannot be as thorough as that one as the VMA is not available > + * in the fast path, so instead we whitelist known good cases and if in doubt, > + * fall back to the slow path. > + */ > +static bool folio_fast_pin_allowed(struct folio *folio, unsigned int flags) > +{ > + struct address_space *mapping; > + unsigned long mapping_flags; > + > + /* > + * If we aren't pinning then no problematic write can occur. A long term > + * pin is the most egregious case so this is the one we disallow. > + */ > + if ((flags & (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE)) != > + (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE)) > + return true; > + > + /* The folio is pinned, so we can safely access folio fields. */ > + > + /* Neither of these should be possible, but check to be sure. */ > + if (unlikely(folio_test_slab(folio) || folio_test_swapcache(folio))) > + return false; > + > + /* hugetlb mappings do not require dirty-tracking. */ > + if (folio_test_hugetlb(folio)) > + return true; > + > + /* > + * GUP-fast disables IRQs. When IRQS are disabled, RCU grace periods > + * cannot proceed, which means no actions performed under RCU can > + * proceed either. > + * > + * inodes and thus their mappings are freed under RCU, which means the > + * mapping cannot be freed beneath us and thus we can safely dereference > + * it. > + */ > + lockdep_assert_irqs_disabled(); > + > + /* > + * However, there may be operations which _alter_ the mapping, so ensure > + * we read it once and only once. > + */ > + mapping = READ_ONCE(folio->mapping); > + > + /* > + * The mapping may have been truncated, in any case we cannot determine > + * if this mapping is safe - fall back to slow path to determine how to > + * proceed. > + */ > + if (!mapping) > + return false; > + > + /* Anonymous folios are fine, other non-file backed cases are not. */ > + mapping_flags = (unsigned long)mapping & PAGE_MAPPING_FLAGS; > + if (mapping_flags) > + return mapping_flags == PAGE_MAPPING_ANON; > + > + /* > + * At this point, we know the mapping is non-null and points to an > + * address_space object. The only remaining whitelisted file system is > + * shmem. > + */ > + return shmem_mapping(mapping); > +} > + > +/* > * Fast-gup relies on pte change detection to avoid concurrent pgtable > * operations. > * > _ > Ack thanks for this, I think it doesn't quite cover all cases (kernel bot moaning on -next for mips), needs some more fiddly #ifdef stuff, I will spin a v9 shortly anyway to fix this up and address a few other minor things.