On Thu, Sep 26, 2019 at 1:55 PM Thomas Hellström (VMware) <thomas_os@xxxxxxxxxxxx> wrote: > > Well, we're working on supporting huge puds and pmds in the graphics > VMAs, although in the write-notify cases we're looking at here, we would > probably want to split them down to PTE level. Well, that's what the existing walker code does if you don't have that "pud_entry()" callback. That said, I assume you would *not* want to do that if the huge pud/pmd is already clean and read-only, but just continue. So you may want to have a special pud_entry() that handles that case. Eventually. Maybe. Although honestly, if you're doing dirty tracking, I doubt it makes much sense to use largepages. > Looking at zap_pud_range() which when called from unmap_mapping_pages() > uses identical locking (no mmap_sem), it seems we should be able to get > away with i_mmap_lock(), making sure the whole page table doesn't > disappear under us. So it's not clear to me why the mmap_sem is strictly > needed here. Better to sort those restrictions out now rather than when > huge entries start appearing. zap_pud_range()actually does have that VM_BUG_ON_VMA(!rwsem_is_locked(&tlb->mm->mmap_sem), vma); exactly for the case where it might have to split the pud entry. Zapping the whole thing it does do without the assert. I'm not going to swear the mmap_sem is absolutely required, since a shared vma should be stable due to the i_mmap_lock, but splitting the hugepage really is a fairly big deal. It can't happen if you zap the *whole* mapping, but it can happen if you have a start/end range. Like you do. Also, in general it's probably not a great idea to look at zap_page_range() (and copy_page_range()) for ideas. They are kind of special, since they tend to be used for fundamental whole-address-space operations (ie fork/exit) and so as a result they get to do special things that a normal page walker generally shouldn't do. It's why they've never gotten translated to use the generic walker code. Linus