On Mon 21-11-22 18:40:06, Amir Goldstein wrote: > On Wed, Nov 16, 2022 at 6:24 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > > > Can't we introduce some SRCU lock / unlock into > > > > > file_start_write() / file_end_write() and then invoke synchronize_srcu() > > > > > during checkpoint after removing ignore marks? It will be much cheaper as > > > > > we don't have to flush all dirty data to disk and also writes can keep > > > > > flowing while we wait for outstanding writes straddling checkpoint to > > > > > complete. What do you think? > > > > > > > > Maybe, but this is not enough. > > > > Note that my patches [1] are overlapping fsnotify_mark_srcu with > > > > file_start_write(), so we would need to overlay fsnotify_mark_srcu > > > > with this new "modify SRCU". > > > > > > > > [1] https://github.com/amir73il/linux/commits/fanotify_pre_content > > > > > > Yes, I know that and frankly, that is what I find somewhat ugly :) I'd rather > > > have the "modify SRCU" cover the whole region we need - which means > > > including the generation of PRE_MODIFY event. > > > > > > > Yeh, it would be great if we can pull this off. > > OK. I decided to give this a shot, see: > > https://github.com/amir73il/linux/commits/sb_write_barrier > > It is just a sketch to show the idea, very lightly tested. > What I did is, instead of converting all the sb_start,end_write() > call sites, which would be a huge change, only callers of > sb_start,end_write_srcu() participate in the "modify SRCU". > > I then converted all the dir modify call sites and some other > call sites to use helpers that take SRCU and call pre-modify hooks. > > [...] I've glanced over the changes and yes, that's what I was imagining :). > > > > > The technical problem I see is how to deal with AIO / io_uring because > > > > > SRCU needs to be released in the same context as it is acquired - that > > > > > would need to be consulted with Paul McKenney if we can make it work. And > > > > > another problem I see is that it might not be great to have this > > > > > system-wide as e.g. on networking filesystems or pipes writes can block for > > > > > really long. > > > > > > > I averted this problem for now - file data writes are not covered by > s_write_srcu with my POC branch. Since you've made the SRCU per sb there is no problem with writes blocking too long on some filesystems. I've asked RCU guys about the problem with SRCU being acquired / released from different contexts. Logically, it seems it should be possible to make this work but maybe I miss some technical detail. > The rationale is that with file data write, HSM would anyway need to > use fsfreeze() to get any guarantee, so maybe s_write_srcu is not really > useful here?? > > It might be useful to use s_write_srcu to cover the pre-modify event > up to after sb_start_write() in file write/aio/io_uring, but not byond it, > so that sb_write_barrier()+fsfreeze() will provide full coverage for > in-progress writes. > > Please let me know if this plan sounds reasonable. Let's see what RCU guys reply. I'd prefer to cover the whole write for simplicity if it is reasonably possible. > > > > > Final question is how to expose this to userspace because this > > > > > functionality would seem useful outside of filesystem notification space so > > > > > probably do not need to tie it to that. > > In the current POC branch, nothing calls sb_write_barrier() yet. > I was thinking of using it when flushing marks, maybe with the > FAN_MARK_SYNC flag that I proposed. Yes, that's what I'd imagine as well. > For general purpose API, the semantics would need to be better > defined, as with this "opt-in" implementation, only some of the > modification operations are covered by the 'sb write barrier'. Yes, for now we can keep things internal to fsnotify. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR