Re: [PATCH v4 00/16] fanotify: add pre-content hooks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 30, 2024 at 10:55:10AM +0200, Amir Goldstein wrote:
> On Thu, Aug 29, 2024 at 11:41 PM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
> >
> > On Wed, Aug 14, 2024 at 05:25:18PM -0400, Josef Bacik wrote:
> > > v3: https://lore.kernel.org/linux-fsdevel/cover.1723228772.git.josef@xxxxxxxxxxxxxx/
> > > v2: https://lore.kernel.org/linux-fsdevel/cover.1723144881.git.josef@xxxxxxxxxxxxxx/
> > > v1: https://lore.kernel.org/linux-fsdevel/cover.1721931241.git.josef@xxxxxxxxxxxxxx/
> > >
> > > v3->v4:
> > > - Trying to send a final verson Friday at 5pm before you go on vacation is a
> > >   recipe for silly mistakes, fixed the xfs handling yet again, per Christoph's
> > >   review.
> > > - Reworked the file system helper so it's handling of fpin was a little less
> > >   silly, per Chinner's suggestion.
> > > - Updated the return values to not or in VM_FAULT_RETRY, as we have a comment
> > >   in filemap_fault that says if VM_FAULT_ERROR is set we won't have
> > >   VM_FAULT_RETRY set.
> > >
> > > v2->v3:
> > > - Fix the pagefault path to do MAY_ACCESS instead, updated the perm handler to
> > >   emit PRE_ACCESS in this case, so we can avoid the extraneous perm event as per
> > >   Amir's suggestion.
> > > - Reworked the exported helper so the per-filesystem changes are much smaller,
> > >   per Amir's suggestion.
> > > - Fixed the screwup for DAX writes per Chinner's suggestion.
> > > - Added Christian's reviewed-by's where appropriate.
> > >
> > > v1->v2:
> > > - reworked the page fault logic based on Jan's suggestion and turned it into a
> > >   helper.
> > > - Added 3 patches per-fs where we need to call the fsnotify helper from their
> > >   ->fault handlers.
> > > - Disabled readahead in the case that there's a pre-content watch in place.
> > > - Disabled huge faults when there's a pre-content watch in place (entirely
> > >   because it's untested, theoretically it should be straightforward to do).
> > > - Updated the command numbers.
> > > - Addressed the random spelling/grammer mistakes that Jan pointed out.
> > > - Addressed the other random nits from Jan.
> > >
> > > --- Original email ---
> > >
> > > Hello,
> > >
> > > These are the patches for the bare bones pre-content fanotify support.  The
> > > majority of this work is Amir's, my contribution to this has solely been around
> > > adding the page fault hooks, testing and validating everything.  I'm sending it
> > > because Amir is traveling a bunch, and I touched it last so I'm going to take
> > > all the hate and he can take all the credit.
> > >
> > > There is a PoC that I've been using to validate this work, you can find the git
> > > repo here
> > >
> > > https://github.com/josefbacik/remote-fetch
> > >
> > > This consists of 3 different tools.
> > >
> > > 1. populate.  This just creates all the stub files in the directory from the
> > >    source directory.  Just run ./populate ~/linux ~/hsm-linux and it'll
> > >    recursively create all of the stub files and directories.
> > > 2. remote-fetch.  This is the actual PoC, you just point it at the source and
> > >    destination directory and then you can do whatever.  ./remote-fetch ~/linux
> > >    ~/hsm-linux.
> > > 3. mmap-validate.  This was to validate the pagefault thing, this is likely what
> > >    will be turned into the selftest with remote-fetch.  It creates a file and
> > >    then you can validate the file matches the right pattern with both normal
> > >    reads and mmap.  Normally I do something like
> > >
> > >    ./mmap-validate create ~/src/foo
> > >    ./populate ~/src ~/dst
> > >    ./rmeote-fetch ~/src ~/dst
> > >    ./mmap-validate validate ~/dst/foo
> > >
> > > I did a bunch of testing, I also got some performance numbers.  I copied a
> > > kernel tree, and then did remote-fetch, and then make -j4
> > >
> > > Normal
> > > real    9m49.709s
> > > user    28m11.372s
> > > sys     4m57.304s
> > >
> > > HSM
> > > real    10m6.454s
> > > user    29m10.517s
> > > sys     5m2.617s
> > >
> > > So ~17 seconds more to build with HSM.  I then did a make mrproper on both trees
> > > to see the size
> > >
> > > [root@fedora ~]# du -hs /src/linux
> > > 1.6G    /src/linux
> > > [root@fedora ~]# du -hs dst
> > > 125M    dst
> > >
> > > This mirrors the sort of savings we've seen in production.
> > >
> > > Meta has had these patches (minus the page fault patch) deployed in production
> > > for almost a year with our own utility for doing on-demand package fetching.
> > > The savings from this has been pretty significant.
> > >
> > > The page-fault hooks are necessary for the last thing we need, which is
> > > on-demand range fetching of executables.  Some of our binaries are several gigs
> > > large, having the ability to remote fetch them on demand is a huge win for us
> > > not only with space savings, but with startup time of containers.
> >
> > So... does this pre-content fetcher already work for regular reads and
> > writes, and now you're looking to wire up page faults?  Or does it only
> > handle page faults?  Judging from Amir's patches I'm guessing the
> > FAN_PRE_{ACCESS,MODIFY} events are new, so this only sends notifications
> > prior to read and write page faults?  The XFS change looks reasonable to
> > me, but I'm left wondering "what does this shiny /do/?"
> >
> 
> I *think* I understand the confusion.
> 
> Let me try to sort it out.
> 
> This patch set collaboration aims to add the functionality of HSM
> service by adding events FS_PRE_{ACCESS,MODIFY} prior to
> read/write and page faults.
> 
> Maybe you are puzzled by not seeing any new read/write hooks?
> This is because the HSM events utilize the existing fsnotify_file_*perm()
> hooks that are in place for the legacy FS_ACCESS_PERM event.
> 
> So why is a new FS_PRE_ACCESS needed?
> Let me first quote commit message of patch 2/16 [1]:
> ---
> The new FS_PRE_ACCESS permission event is similar to FS_ACCESS_PERM,
> but it meant for a different use case of filling file content before
> access to a file range, so it has slightly different semantics.
> 
> Generate FS_PRE_ACCESS/FS_ACCESS_PERM as two seperate events, same as
> we did for FS_OPEN_PERM/FS_OPEN_EXEC_PERM.
> 
> FS_PRE_MODIFY is a new permission event, with similar semantics as
> FS_PRE_ACCESS, which is called before a file is modified.
> 
> FS_ACCESS_PERM is reported also on blockdev and pipes, but the new
> pre-content events are only reported for regular files and dirs.
> 
> The pre-content events are meant to be used by hierarchical storage
> managers that want to fill the content of files on first access.
> ---
> 
> And from my man page draft [2]:
> ---
>        FAN_PRE_ACCESS (since Linux 6.x)
>               Create  an  event before a read from a directory or a file range,
>               that provides an opportunity for the event listener to
> modify the content of the object
>               before the reader is going to access the content in the
> specified range.
>               An additional information record of type
> FAN_EVENT_INFO_TYPE_RANGE is returned
>              for each event in the read buffer.
> ...
> ---
> 
> So the semantics of the two events is slightly different, but also the
> meaning of "an opportunity for the event listener to modify the content".
> 
> FS_ACCESS_PERM already provided this opportunity on old kernels,
> but prior to "Tidy up file permission hooks" series [3] in v6.8, writing
> file content in FS_ACCESS_PERM event context was prone to deadlocks.
> 
> Therefore, an application using FS_ACCESS_PERM may be prone
> for deadlocks, while an application using FAN_PRE_ACCESS should
> be safe in that regard.

Ah, ok, that's where the missing pieces are. :)

--D

> Thanks,
> Amir.
> 
> [1] https://lore.kernel.org/all/a96217d84dfebb15582a04524dc9821ba3ea1406.1723670362.git.josef@xxxxxxxxxxxxxx/
> [2] https://github.com/amir73il/man-pages/commits/fan_pre_path
> [3] https://lore.kernel.org/all/20231122122715.2561213-1-amir73il@xxxxxxxxx/
> 




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux