On Wed, Nov 23, 2022 at 2:11 PM Jan Kara <jack@xxxxxxx> wrote: > > On Mon 21-11-22 18:40:06, Amir Goldstein wrote: > > On Wed, Nov 16, 2022 at 6:24 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > > > > > Can't we introduce some SRCU lock / unlock into > > > > > > file_start_write() / file_end_write() and then invoke synchronize_srcu() > > > > > > during checkpoint after removing ignore marks? It will be much cheaper as > > > > > > we don't have to flush all dirty data to disk and also writes can keep > > > > > > flowing while we wait for outstanding writes straddling checkpoint to > > > > > > complete. What do you think? > > > > > > > > > > Maybe, but this is not enough. > > > > > Note that my patches [1] are overlapping fsnotify_mark_srcu with > > > > > file_start_write(), so we would need to overlay fsnotify_mark_srcu > > > > > with this new "modify SRCU". > > > > > > > > > > [1] https://github.com/amir73il/linux/commits/fanotify_pre_content > > > > > > > > Yes, I know that and frankly, that is what I find somewhat ugly :) I'd rather > > > > have the "modify SRCU" cover the whole region we need - which means > > > > including the generation of PRE_MODIFY event. > > > > > > > > > > Yeh, it would be great if we can pull this off. > > > > OK. I decided to give this a shot, see: > > > > https://github.com/amir73il/linux/commits/sb_write_barrier > > > > It is just a sketch to show the idea, very lightly tested. > > What I did is, instead of converting all the sb_start,end_write() > > call sites, which would be a huge change, only callers of > > sb_start,end_write_srcu() participate in the "modify SRCU". > > > > I then converted all the dir modify call sites and some other > > call sites to use helpers that take SRCU and call pre-modify hooks. > > > > [...] > > I've glanced over the changes and yes, that's what I was imagining :). FYI, just pushed an update to that branch which includes *write_srcu wrappers for all the call sites of security_file_permission(file, MAY_WRITE), mostly file_start_write_area()/file_end_write_srcu(). For async write, write_srcu only covers the pre-modify event and the io submittion. > > > > > > > The technical problem I see is how to deal with AIO / io_uring because > > > > > > SRCU needs to be released in the same context as it is acquired - that > > > > > > would need to be consulted with Paul McKenney if we can make it work. And > > > > > > another problem I see is that it might not be great to have this > > > > > > system-wide as e.g. on networking filesystems or pipes writes can block for > > > > > > really long. > > > > > > > > > > I averted this problem for now - file data writes are not covered by > > s_write_srcu with my POC branch. > > Since you've made the SRCU per sb there is no problem with writes blocking > too long on some filesystems. I've asked RCU guys about the problem with > SRCU being acquired / released from different contexts. Logically, it seems > it should be possible to make this work but maybe I miss some technical > detail. > > > The rationale is that with file data write, HSM would anyway need to > > use fsfreeze() to get any guarantee, so maybe s_write_srcu is not really > > useful here?? > > That was a wrong statement. For data writes HSM would need to do syncfs() after sb_write_barrier(), not fsfreeze. With the current POC branch, fsfreeze would only be needed to wait for the async write completions. > > It might be useful to use s_write_srcu to cover the pre-modify event > > up to after sb_start_write() in file write/aio/io_uring, but not byond it, > > so that sb_write_barrier()+fsfreeze() will provide full coverage for > > in-progress writes. > > > > Please let me know if this plan sounds reasonable. > > Let's see what RCU guys reply. I'd prefer to cover the whole write for > simplicity if it is reasonably possible. > OK. Other backup plans: - Set a flag in pre-modify events from async write so HSM knowns that event is not "atomic" with modification - HSM may deny all events on this sort if it needs to record a change or it may mark the snapshot "requires freeze" - Reliably deliver post-write events only for the async write completions (also on error), so HSM can track only those event pairs Thanks, Amir.