On Wed, Nov 16, 2022 at 6:24 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > Can't we introduce some SRCU lock / unlock into > > > > file_start_write() / file_end_write() and then invoke synchronize_srcu() > > > > during checkpoint after removing ignore marks? It will be much cheaper as > > > > we don't have to flush all dirty data to disk and also writes can keep > > > > flowing while we wait for outstanding writes straddling checkpoint to > > > > complete. What do you think? > > > > > > Maybe, but this is not enough. > > > Note that my patches [1] are overlapping fsnotify_mark_srcu with > > > file_start_write(), so we would need to overlay fsnotify_mark_srcu > > > with this new "modify SRCU". > > > > > > [1] https://github.com/amir73il/linux/commits/fanotify_pre_content > > > > Yes, I know that and frankly, that is what I find somewhat ugly :) I'd rather > > have the "modify SRCU" cover the whole region we need - which means > > including the generation of PRE_MODIFY event. > > > > Yeh, it would be great if we can pull this off. OK. I decided to give this a shot, see: https://github.com/amir73il/linux/commits/sb_write_barrier It is just a sketch to show the idea, very lightly tested. What I did is, instead of converting all the sb_start,end_write() call sites, which would be a huge change, only callers of sb_start,end_write_srcu() participate in the "modify SRCU". I then converted all the dir modify call sites and some other call sites to use helpers that take SRCU and call pre-modify hooks. [...] > > > > The technical problem I see is how to deal with AIO / io_uring because > > > > SRCU needs to be released in the same context as it is acquired - that > > > > would need to be consulted with Paul McKenney if we can make it work. And > > > > another problem I see is that it might not be great to have this > > > > system-wide as e.g. on networking filesystems or pipes writes can block for > > > > really long. > > > > I averted this problem for now - file data writes are not covered by s_write_srcu with my POC branch. The rationale is that with file data write, HSM would anyway need to use fsfreeze() to get any guarantee, so maybe s_write_srcu is not really useful here?? It might be useful to use s_write_srcu to cover the pre-modify event up to after sb_start_write() in file write/aio/io_uring, but not byond it, so that sb_write_barrier()+fsfreeze() will provide full coverage for in-progress writes. Please let me know if this plan sounds reasonable. > > > > Final question is how to expose this to userspace because this > > > > functionality would seem useful outside of filesystem notification space so > > > > probably do not need to tie it to that. In the current POC branch, nothing calls sb_write_barrier() yet. I was thinking of using it when flushing marks, maybe with the FAN_MARK_SYNC flag that I proposed. For general purpose API, the semantics would need to be better defined, as with this "opt-in" implementation, only some of the modification operations are covered by the 'sb write barrier'. Thanks, Amir.