Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> writes: > On Mon, Jul 08, 2024 at 04:43:15PM GMT, Liam R. Howlett wrote: >> ... >> The functionality here has changed >> --- from --- >> may_expand_vm() check >> can_modify_mm() check >> arch_unmap() >> vms_gather_munmap_vmas() >> ... >> >> --- to --- >> can_modify_mm() check >> arch_unmap() >> vms_gather_munmap_vmas() >> may_expand_vm() check >> ... >> >> vms_gather_munmap_vmas() does nothing but figures out what to do later, >> but could use memory and can fail. >> >> The user implications are: >> >> 1. The return type on the error may change to -EPERM from -ENOMEM, if >> you are not allowed to expand and are trying to overwrite mseal()'ed >> VMAs. That seems so very rare that I'm not sure it's worth mentioning. >> >> >> 2. arch_unmap() called prior to may_expand_vm(). >> powerpc uses this to set mm->context.vdso = NULL if mm->context.vdso is >> within the unmap range. User implication of this means that an >> application my set the vdso to NULL prior to hitting the -ENOMEM case in >> may_expand_vm() due to the address space limit. >> >> Assuming the removal of the vdso does not cause the application to seg >> fault, then the user visible change is that any vdso call after a failed >> mmap(MAP_FIXED) call would result in a seg fault. The only reason it >> would fail is if the mapping process was attempting to map a large >> enough area over the vdso (which is accounted and in the vma tree, >> afaict) and ran out of memory. Note that this situation could arise >> already since we could run out of memory (not accounting) after the >> arch_unmap() call within the kernel. >> >> The code today can suffer the same fate, but not by the accounting >> failure. It can happen due to failure to allocate a new vma, >> do_vmi_munmap() failure after the arch_unmap() call, or any of the other >> failure scenarios later in the mmap_region() function. >> >> At the very least, this requires an expanded change log. > > Indeed, also (as mentioned on IRC) I feel like we need to look at whether > we _truly_ need this arch_unmap() call for a single, rather antiquated, > architecture. You can call it "niche" or "irrelevant" or "fringe", but "antiquated" is factually wrong :) Power10 came out of the fab just a few years ago at 7nm. > I mean why are they unmapping the VDSO, why is that valid, why does it need > that field to be set to NULL, is it possible to signify that in some other > way etc.? It was originally for CRIU. So a niche workload on a niche architecture. But from the commit that added it, it sounds like CRIU was using mremap, which should be handled these days by vdso_mremap(). So it could be that arch_unmap() is not actually needed for CRIU anymore. Then I guess we have to decide if removing our arch_unmap() would be an ABI break, regardless of whether CRIU needs it or not. cheers