Re: thoughts about fanotify and HSM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 23, 2022 at 2:11 PM Jan Kara <jack@xxxxxxx> wrote:
>
> On Mon 21-11-22 18:40:06, Amir Goldstein wrote:
> > On Wed, Nov 16, 2022 at 6:24 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > >
> > > > > > Can't we introduce some SRCU lock / unlock into
> > > > > > file_start_write() / file_end_write() and then invoke synchronize_srcu()
> > > > > > during checkpoint after removing ignore marks? It will be much cheaper as
> > > > > > we don't have to flush all dirty data to disk and also writes can keep
> > > > > > flowing while we wait for outstanding writes straddling checkpoint to
> > > > > > complete. What do you think?
> > > > >
> > > > > Maybe, but this is not enough.
> > > > > Note that my patches [1] are overlapping fsnotify_mark_srcu with
> > > > > file_start_write(), so we would need to overlay fsnotify_mark_srcu
> > > > > with this new "modify SRCU".
> > > > >
> > > > > [1] https://github.com/amir73il/linux/commits/fanotify_pre_content
> > > >
> > > > Yes, I know that and frankly, that is what I find somewhat ugly :) I'd rather
> > > > have the "modify SRCU" cover the whole region we need - which means
> > > > including the generation of PRE_MODIFY event.
> > > >
> > >
> > > Yeh, it would be great if we can pull this off.
> >
> > OK. I decided to give this a shot, see:
> >
> > https://github.com/amir73il/linux/commits/sb_write_barrier
> >
> > It is just a sketch to show the idea, very lightly tested.
> > What I did is, instead of converting all the sb_start,end_write()
> > call sites, which would be a huge change, only callers of
> > sb_start,end_write_srcu() participate in the "modify SRCU".
> >
> > I then converted all the dir modify call sites and some other
> > call sites to use helpers that take SRCU and call pre-modify hooks.
> >
> > [...]
>
> I've glanced over the changes and yes, that's what I was imagining :).

FYI, just pushed an update to that branch which includes *write_srcu
wrappers for all the call sites of security_file_permission(file, MAY_WRITE),
mostly file_start_write_area()/file_end_write_srcu().

For async write, write_srcu only covers the pre-modify event and the
io submittion.

>
> > > > > > The technical problem I see is how to deal with AIO / io_uring because
> > > > > > SRCU needs to be released in the same context as it is acquired - that
> > > > > > would need to be consulted with Paul McKenney if we can make it work. And
> > > > > > another problem I see is that it might not be great to have this
> > > > > > system-wide as e.g. on networking filesystems or pipes writes can block for
> > > > > > really long.
> > > > > >
> >
> > I averted this problem for now - file data writes are not covered by
> > s_write_srcu with my POC branch.
>
> Since you've made the SRCU per sb there is no problem with writes blocking
> too long on some filesystems. I've asked RCU guys about the problem with
> SRCU being acquired / released from different contexts. Logically, it seems
> it should be possible to make this work but maybe I miss some technical
> detail.
>
> > The rationale is that with file data write, HSM would anyway need to
> > use fsfreeze() to get any guarantee, so maybe s_write_srcu is not really
> > useful here??
> >

That was a wrong statement. For data writes HSM would need to do
syncfs() after sb_write_barrier(), not fsfreeze.
With the current POC branch, fsfreeze would only be needed to wait
for the async write completions.

> > It might be useful to use s_write_srcu to cover the pre-modify event
> > up to after sb_start_write() in file write/aio/io_uring, but not byond it,
> > so that sb_write_barrier()+fsfreeze() will provide full coverage for
> > in-progress writes.
> >
> > Please let me know if this plan sounds reasonable.
>
> Let's see what RCU guys reply. I'd prefer to cover the whole write for
> simplicity if it is reasonably possible.
>

OK.

Other backup plans:
- Set a flag in pre-modify events from async write so HSM
  knowns that event is not "atomic" with modification
- HSM may deny all events on this sort if it needs to record
  a change or it may mark the snapshot "requires freeze"
- Reliably deliver post-write events only for the async write
  completions (also on error), so HSM can track only those
  event pairs

Thanks,
Amir.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux