On Thu, May 30, 2024 at 1:32 PM Christian Brauner <brauner@xxxxxxxxxx> wrote: > > On Wed, May 29, 2024 at 04:41:32PM -0400, Josef Bacik wrote: > > NOTE: > > This is compile tested only. It's also based on an out of tree feature branch > > from Amir that I'm extending to add page fault content events to allow us to > > have on-demand binary loading at exec time. If you need more context please let > > me know, I'll push my current branch somewhere and wire up how I plan to use > > this patch so you can see it in better context, but hopefully I've described > > what I'm trying to accomplish enough that this leads to useful discussion. > > > > > > Currently we have ->i_writecount to control write access to an inode. > > Callers may deny write access by calling deny_write_access(), which will > > cause ->i_writecount to go negative, and allow_write_access() to push it > > back up to 0. > > > > This is used in a few ways, the biggest user being exec. Historically > > we've blocked write access to files that are opened for executing. > > fsverity is the other notable user. > > > > With the upcoming fanotify feature that allows for on-demand population > > of files, this blanket policy of denying writes to files opened for > > executing creates a problem. We have to issue events right before > > denying access, and the entire file must be populated before we can > > continue with the exec. > > > > This creates a problem for users who have large binaries they want to > > populate on demand. Inside of Meta we often have multi-gigabyte > > binaries, even on the order of tens of gigabytes. Pre-loading these > > large files is costly, especially when large portions of the binary may > > never be read (think debuginfo). > > > > In order to facilitate on-demand loading of binaries we need to have a > > way to punch a hole in this exec related write denial. > > Hm. I suggest we try to tackle this in a completely different way. Maybe > I mentioned it during LSFMM but instead of doing this dance we should > try and remove the deny_write_access() mechanisms for exec completely. > > Back in 2021 we removed the remaining VM_DENYWRITE bits but left in > deny_write_access() for exec. Back then I had thought that this was a > bit risky for not too much gain. But looking at this code here I think > we now have an even stronger reason to try and get rid of this > restriction. And I've since changed my mind. See my notes on the > _completely untested_ RFC patch I appended. > This looks much simpler :) Obviously, a proper patch would need to also turn get_write_access() to void and clean up the error handling code of its callers. > Ofc depends on whether Linus still agrees that removing this might be > something we could try. > Crossing my fingers that this will be acceptable. Thanks, Amir.