Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: > On Thu, Aug 12, 2021 at 7:48 AM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: >> >> Given that MAP_PRIVATE for shared libraries is our strategy for handling >> writes to shared libraries perhaps we just need to use MAP_POPULATE or a >> new related flag (perhaps MAP_PRIVATE_NOW) > > No. That would be horrible for the usual bloated GUI libraries. It > might help some (dynamic page faults are not cheap either), but it > would hurt a lot. I wasn't aiming so much at the MAP_POPULATE part but something that would trigger cow from writes to the file. I see code that is close but I don't see any code in the kernel that would implement that currently. Upon reflection I think it will always be difficult to trigger cow from the file write side of the kernel as code that would cow the page in the page cache would cause problems with writable mmaps. > This is definitely a "if you overwrite a system library while it's > being used, you get to keep both pieces" situation. > > The kernel ETXTBUSY thing is purely a courtesy feature, and as people > have noticed it only really works for the main executable because of > various reasons. It's not something user space should even rely on, > it's more of a "ok, you're doing something incredibly stupid, and > we'll help you avoid shooting yourself in the foot when we notice". > > Any distro should make sure their upgrade tools don't just > truncate/write to random libraries executables. Yes. I am trying to come up with advice on how userspace implementations can implement their tools to use other mechanisms that solve the overwriting shared libaries and executables problem that are not broken by design. For a little bit the way Florian Weirmer was talking and the fact that uselib uses MAP_PRIVATE had me thinking that somehow MAP_PRIVATE could be part of the solution. I have now looked into the implementation of MAP_PRIVATE and I since we don't perform the cow on filesystem writes MAP_PRIVATE absolutely can not be part of the solution we recommend to userspace. So today the best advice I can give to userspace is to mark their executables and shared libraries as read-only and immutable. Otherwise a change to the executable file can change what is mapped into memory. MAP_PRIVATE does not help. > And if they do, it's really not a kernel issue. What is a kernel issue is giving people good advice on how to use kernel features to solve real world problems. I have seen the write to a mapped exectuable/shared lib problem, and Florian has seen it. So while rare the problem is real and a pain to debug. > This patch series basically takes this very historical error return, > and simplifies and clarifies the implementation, and in the process > might change some very subtle corner case (unmapping the original > executable entirely?). I hope (and think) it wouldn't matter exactly > because this is a "courtesy error" rather than anything that a sane > setup would _depend_ on, but hey, insane setups clearly exist. Oh yes. I very much agree that the design of this patchset is perfectly fine. I also see that MAP_DENYWRITE is unfortunately broken by design. I vaguely remember the discussion when MAP_DENYWRITE was made a noop because of the denial-of-service aspect of MAP_DENYWRITE. I very much agree that we should strongly encourage userspace not to write to mmaped files. As I am learning with my two year old, it helps to give a constructive suggestion of alternative behavior instead of just saying no. Florian reported that there remains a problem in userspace. So I am coming up with a constructive suggestion. My apologies for going off into the weeds for a moment. Eric