On Wed, Dec 23, 2020 at 02:45:59PM -0800, Nadav Amit wrote: > I think it may be reasonable. Whatever solution used, there will be 2 users of it: uffd-wp will use whatever technique used by clear_refs_write to avoid the mmap_write_lock. My favorite is Yu's patch and not the group lock anymore. The cons is it changes the VM rules (which kind of reminds me my initial proposal of adding a spurious tlb flush if mm_tlb_flush_pending is set, except I didn't correctly specify it'd need to go in the page fault), but it still appears the simplest. > Just a proposal: At some point we can also ask ourselves whether the > “artificial" limitation of the number of software bits per PTE should really > limit us, or do we want to hold some additional metadata per-PTE by either > putting it in an adjacent page (holding 64-bits of additional software-bits > per PTE) or by finding some place in the page-struct to link to this > metadata (and have the liberty of number of bits per PTE). One of the PTE > software-bits can be repurposed to say whether there is “extra-metadata” > associated with the PTE. > > I am fully aware that there will be some overhead associated, but it > can be limited to less-common use-cases. That's a good point, so far far we didn't run out so it's not an immediate concern. (as opposed we run out in page->flags where the PG_tail went to some LSB). In general kicking the can down the road sounds like the best thing to do for those bit shortage matters, until we can't anymore at least.. There's no gain to the kernel runtime, in doing something generically good here (again see where PG_tail rightfully went). So before spending RAM and CPU, we'd need to find a more compact encoding with the bits we already have available. This reminds me again we could double check if we could make VM_UFFD_WP mutually exclusive with VM_SOFTDIRTY. I wasn't sure if it could ever happen in a legit way to use both at the same time (CRIU on a app using uffd-wp for its own internal mm management?). Theoretically it's too late already for it, but VM_UFFD_WP is relatively new, if we're sure it cannot ever happen in a legit way, it would be possible to at least evaluate/discuss it. This is an immediate matter. What we'll do if we later run out, is not an immediate matter instead, because it won't make our life any simpler to resolve it now. Thanks, Andrea