On 8/14/23, David Hildenbrand <david@xxxxxxxxxx> wrote: > On 14.08.23 10:21, Mateusz Guzik wrote: >> On 8/14/23, David Hildenbrand <david@xxxxxxxxxx> wrote: >>> On 13.08.23 14:33, Mateusz Guzik wrote: >>>> xchg originated in 6e399cd144d8 ("prctl: avoid using mmap_sem for >>>> exe_file serialization"). While the commit message does not explain >>>> *why* the change, clearly the intent was to use mmap_sem less in this >>>> codepath. I found the original submission [1] which ultimately claims >>>> it >>>> cleans things up. >>> >>> More details are apparently in v1 of that patch: >>> >>> https://lore.kernel.org/lkml/1424979417.10344.14.camel@xxxxxxxxxxxx/ >>> >>> Regarding your patch: adding more mmap_write_lock() where avoidable, I'm >>> not so sure. >>> >> >> But exe_file access is already synchronized with the semaphore and >> your own commit added a mmap_read_lock/mmap_read_unlock cycle after >> the xchg in question to accomodate this requirement. > > Yes, we want to handle concurrent fork() ("Don't race with dup_mmap()"), > thus mmap_read_lock. > >> Then mmap_write_lock around the replacement is the obvious thing to do. > > Apparently to you. :) > > mmap_write_lock will block more than fork. For example, concurrent page > faults (without new VMA locking), for no apparent reason to me. > >> >>> Your patch doesn't look (to me) like it is removing a lot of complexity. >>> >> >> The code in the current form makes the reader ask what prompts xchg + >> read-lock instead of mere write-locking. >> >> This is not a hot path either and afaics it can only cause contention >> if userspace is trying to abuse the interface to break the kernel, >> messing with a processs hard at work (which would be an extra argument >> to not play games on kernel side). >> >> That said, no, it does not remove "a lot of complexity". It does >> remove some though at no real downside that I can see. >> >> But then again, it is on people who insist on xchg to justify it. > > Changing it now needs good justification, why we would want to block any > concurrent MM activity. Maybe there is good justification. > > In any case, this commit would have to update the documentation of > replace_mm_exe_file, that spells out existing locking behavior. > Perhaps it will help if I add that the prctl thingy always had a troubled relationship with locking. Last time I looked at it was in 2016, where I found that it was doing down_read to update arg_start/arg_end and others while a consumer in procfs would read them and assert on their sanity. As only a read-lock was held, 2 threads could be used to transiently produce a bogus state as they race with their updates and trigger the BUG. See this commit (but also excuse weirdly bad english ;)) https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ddf1d398e517e660207e2c807f76a90df543a217 Moreover check out the following in prctl_set_auxv: task_lock(current); memcpy(mm->saved_auxv, user_auxv, len); task_unlock(current); any thread in the process can reach that codepath while sharing the same mm, thus this does not lock any updates. Not only that, but a duplicated memcpy onto the area in prctl_set_mm_map does not even take that lock and the code to read this does not take any locks. [Code duplication and synchronization aside, additional points deducted for saved_auxv storing always-NULL pointers instead of adding them on reads.] The above exhausts my willingness to argue about this change, I'm just a passerby. If it is NAKed, I'm dropping the subject. I am willing to do the comment tidy ups if this can go in though, but not before there is consensus. -- Mateusz Guzik <mjguzik gmail.com>