On 23/01/2024 12:19, David Hildenbrand wrote: > [...] > >> >> I wrote some documentation for this (based on Matthew's docs for set_ptes() in >> my version. Perhaps it makes sense to add it here, given this is overridable by >> the arch. >> >> /** >> * wrprotect_ptes - Write protect a consecutive set of pages. >> * @mm: Address space that the pages are mapped into. >> * @addr: Address of first page to write protect. >> * @ptep: Page table pointer for the first entry. >> * @nr: Number of pages to write protect. >> * >> * May be overridden by the architecture, else implemented as a loop over >> * ptep_set_wrprotect(). >> * >> * Context: The caller holds the page table lock. The PTEs are all in the same >> * PMD. >> */ >> > > I could have sworn I had a documentation at some point. Let me add some, thanks. > > [...] > >>> + >>> + /* >>> + * If we likely have to copy, just don't bother with batching. Make >>> + * sure that the common "small folio" case stays as fast as possible >>> + * by keeping the batching logic separate. >>> + */ >>> + if (unlikely(!*prealloc && folio_test_large(folio) && max_nr != 1)) { >>> + nr = folio_pte_batch(folio, addr, src_pte, pte, max_nr); >>> + if (folio_test_anon(folio)) { >>> + folio_ref_add(folio, nr); >>> + if (unlikely(folio_try_dup_anon_rmap_ptes(folio, page, >>> + nr, src_vma))) { >> >> What happens if its not the first page of the batch that fails here? Aren't you >> signalling that you need a prealloc'ed page for the wrong pte? Shouldn't you >> still batch copy all the way up to the failing page first? Perhaps it all comes >> out in the wash and these events are so infrequent that we don't care about the >> lost batching opportunity? > > I assume you mean the weird corner case that some folio pages in the range have > PAE set, others don't -- and the folio maybe pinned. > > In that case, we fallback to individual pages, and might have preallocated a > page although we wouldn't have to preallocate one for processing the next page > (that doesn't have PAE set). > > It should all work, although not optimized to the extreme, and as it's a corner > case, we don't particularly care. Hopefully, in the future we'll only have a > single PAE flag per folio. > > Or am I missing something? No, your explanation makes sense. Just wanted to check this all definitely worked, because the flow is slightly different to my previous version that was doing try_dup_rmap page-by-page. > >> >>> + folio_ref_sub(folio, nr); >>> + return -EAGAIN; >>> + } >>> + rss[MM_ANONPAGES] += nr; >>> + VM_WARN_ON_FOLIO(PageAnonExclusive(page), folio); >>> + } else { >>> + folio_ref_add(folio, nr); >> >> Perhaps hoist this out to immediately after folio_pte_batch() since you're >> calling it on both branches? > > Makes sense, thanks. >