* Pedro Falcato <pedro.falcato@xxxxxxxxx> [240809 14:45]: > On Fri, Aug 9, 2024 at 5:06 PM Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> wrote: > > > > * Pedro Falcato <pedro.falcato@xxxxxxxxx> [240807 17:13]: > > > Delegate all can_modify checks to the proper places. Unmap checks are > > > done in do_unmap (et al). > > > > > > This patch allows for mremap partial failure in certain cases (for > > > instance, when destination VMAs aren't sealed, but the source VMA is). > > > It shouldn't be too troublesome, as you'd need to go out of your way to > > > do illegal operations on a VMA. > > > > As mseal() is supposed to be a security thing, is the illegal operation > > not a concern? > > My 3 cents (note: I'm not a security guy): > > - Linux m*() operations have been allowed to partially fail for ages. > POSIX only disallows this in the munmap case (which is why we need all > that detached VMA logic), but not in any other case. We have a lot of > other failure points in these syscalls, and would require extensive > refactoring to patch this up (very likely with an inevitable > performance regression, as we saw in this case). > > - Despite allowing for partial failure, this patch set always keeps > the sealed VMAs untouched (so that invariant isn't broken). The munmap > semantics remain untouched (and POSIXly correct) due to the detached > VMA logic. > > - I personally have not heard of a single attack making use of this, > and the performance hit is very measurable and exists _for every > user_, despite mseal being a very niche feature (I cannot find a > single user of mseal upstream, both in debian code search, github, > chromium, v8, glibc, and what have you). > ... I really have no disagreement with the above statements, but looking at this further: vma_to_resize() is called in 2 places: 1. mremap() syscall mremap() calls vma_lookup() and then later calls vma_to_resize() which also calls vma_lookup() in the first 5 lines of the function. 2. mremap_to() static function mremap_to() is called only from mreamp(), but earlier than vma_to_resize(). If we move the vma check to mremap() after finding the vma, then we can avoid partial failures due to mseal(). We should probably check as much as possible there, but that change would be too large to fix a regression. iow the check was in the wrong place and was the wrong check, but we can use your check and move it up ~15 lines and everything will be the same and faster. For a later patch, there is an opportunity to even make this faster by passing through the vma to vma_to_resize(). We could remove another walk of the vma tree. Probably not necessary to fix the regression, but it would at least reduce the instruction count - if not a performance increase (depending on cache use). Thanks, Liam