On Mon, May 15, 2023 at 09:12:49AM -0300, Jason Gunthorpe wrote: > On Mon, May 15, 2023 at 12:16:21PM +0100, Lorenzo Stoakes wrote: > > > One thing that came to mind is KVM with "qemu -object memory-backend-file,share=on..." > > > It is mostly used for pmem emulation. > > > > > > Do we have plan B? > > > > Yes, we can make it opt-in or opt-out via a FOLL_FLAG. This would be easy > > to implement in the event of any issues arising. > > I'm becoming less keen on the idea of a per-subsystem opt out. I think > we should make a kernel wide opt out. I like the idea of using lower > lockdown levels. Lots of things become unavaiable in the uAPI when the > lockdown level increases already. This would be the 'safest' in the sense that a user can't be surprised by higher lockdown = access modes disallowed, however we'd _definitely_ need to have an opt-in in that instance so io_uring can make use of this regardless. That's easy to add however. If we do go down that road, we can be even stricter/vary what we do at different levels right? > > > Jason will have some thoughts on this I'm sure. I guess the key question > > here is - is it actually feasible for this to work at all? Once we > > establish that, the rest are details :) > > Surely it is, but like Ted said, the FS folks are not interested and > they are at least half the solution.. :'( > > The FS also has to actively not write out the page while it cannot be > write protected unless it copies the data to a stable page. The block > stack needs the source data to be stable to do checksum/parity/etc > stuff. It is a complicated subject. Yes my sense was that being able to write arbitrarily to these pages _at all_ was a big issue, not only the dirty tracking aspect. I guess at some level letting filesystems have such total flexibility as to how they implement things leaves us in a difficult position. > > Jason