On 10/2/19 10:28 PM, Linus Torvalds wrote: > On Wed, Oct 2, 2019 at 12:09 PM Thomas Hellström (VMware) > <thomas_os@xxxxxxxxxxxx> wrote: >> Yes I typically tend towards using a "namespace_object_operation" naming >> scheme, with "as_dirty" being the namespace here, > We discourage that kind of mindless namespacing for core stuff. > > It makes sense in a driver or a filesystem: when there are 20+ > different filesystems all implementing the same operation, you can't > have descriptive and natural names that are unique. So having a > namespace prefix for the filesystem or driver makes sense. But even > then it tends ot be just a simple name, not the op it does. Understood. > >> Looking at Matthew's suggestion but lining up with >> "unmap_mapping_range()", perhaps we could use "clean_mapping_range" and >> "wp_mapping_range"? > Yes, I agree with Willy that "dirty" is kind of implicit when the > operation is to clean something, so the above sounds sane to me. > > The wp part I'm not entirely sure about: you're not actually > write-protecting the range. You're doing something different. You're > only doing it for shared writable mappings. Both the cleaning operation and the wp operation operate on shared writable mappings, and since they are also both restricted to entries that may take part in dirty-tracking (they're ignoring pmds and puds), so perhaps a "dirty" may make sense anyway, and to point out the similarity: clean_mapping_dirty_range() and wp_mapping_dirty_range()"? > And I'd rather see > separate 'struct mm_walk_ops' than shared ones that then have a flag > in a differfent structure to change behavior. > > Yes, yes, some of the levels share the same logic, but that just means > that you can use the same function pointer for that level in the > different "clean" vs "wp" mm_walk_op. I think that this comment and this last one: > Also, why do you loop inside the pmd_entry function, instead of just > having a pte_entry function?" are tied together. The pagewalk code is kind of awkward since if you provide a pte_entry function, then huge pmds are unconditionally split, even if handled in pmd_entry, forcing pmd-aware users to handle also ptes in pmd_entry(). I'm not entirely sure why, but it looks like it's to avoid a race where huge pmds could otherwise be unintentionally split if appearing *after* the pmd_entry() call. Other pagewalk users do the same here, see for example clear_refs_pte_range(); https://elixir.bootlin.com/linux/latest/source/fs/proc/task_mmu.c#L1040 Also the pagewalk walk_pte_range() for some reason doesn't take the page table lock, which means that a pte may well be cleared under us while we have it cached for modification. For these reasons we can't use the pte_entry, even internally and this means we have three choices: a) Selecting the pte function using a bool. Saves code and avoids extra indirect function call. b) Duplicating the pmd_entry with a different pte function, also duplicating the ops. Avoids extra indirect function call but some extra code. c) Instead of the bool, use another function pointer in yet another ops pointed to by the private structure. Saves some code. I opted for a) here, but I can of course change that if needed? > > Also, looking closer at this patch, this makes me go "Whaa?": > > + pte = (mm == &init_mm) ? > + pte_offset_kernel(pmd, addr) : > + pte_offset_map_lock(mm, pmd, addr, &ptl); > > because I don't think that's sensible. When do you have a vma in kernel space? > > Also, why do you loop inside the pmd_entry function, instead of just > having a pte_entry function? Yes, that was a blind copy from v1 of the code that used "apply_to_page_range()". I'll fix that up. Thanks, /Thomas > > Linus >