Re: robinhood, fanotify name info events and lustre changelog

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Amir Goldstein wrote on Thu, May 28, 2020:
> Since you started this thread privately, I am replying privately,
> but if you don't mind, please respond with CC to linux-fsdevel, linux-api
> and also lustre lists if you like, so other developers may participate in
> the discussion.

No problem going public; added linux-fsdevel, linux-api as suggested and
robinhood-devel for the robinhood side.
It might be interesting to retrofit lustre changelogs into the fanotify
API at some point but I don't see it likely to happen, so let's start
small for now :)

> > (Probably the same as most filesystem indexers would want, I would use
> > it for robinhood[1] - it normally consumes lustre changelogs and not
> > local vfs events so $job doesn't really care, but I would use a fanotify
> > mode for home once it becomes useable because why not :D)
> >
> > [1] https://github.com/cea-hpc/robinhood/
> 
> This looks very interesting.
> So, do you intend to integrate fanotify with robinhood as a hobby project?
> I wonder, I did not find much evidence of robinhood being used outside of
> the Lustre community and without the Lustre Changelog.
> At least least since 2008, I see no public discussions on devel lists
> and only changes seems related to the Lustre mode.

I know robinhood has also been used to purge NFS home directories
("temp" directories that are less restricted than homes in volume but
get purged after x days), both at CEA and in other companies who reached
out to me privately so I cannot name them.
FWIW on this side netapp filers support something similar to changelogs
in the form of audit loggings, which we never bothered implementing but
would probably be accepted if someone bothered -- but with linux knfsd
maybe fanotify on the server side might work ? I haven't tried.

As far as I know that means people using robinhood on NFS just run full
filesystem scans every day/week/x.



That being said, it is also true that robinhood has very few users
outside of the lustre community; I use it for manual file scrubbing
(verifying checksums on a semi-regular basis) at home. As you pointed
out, that really is in the realm of hobby project even if that helped
find a few bugs.


> I am asking because this project looks like it could be interesting for $job.
> I was looking for a "champion app" to demonstrate new fanotify features.
> I chose inotify-tools for the demo, because it was the easiest to adapt,
> but was going to start a more serious look into Watchman.
> Watchman seems to be in heavy use in Facebook and actively maintained.
> It's starting point is inotify (+fs scanner of course), so I expect it
> would be an
> easier fit than to start from Lustre Changelog as a starting point. or
> not?

robinhood (current master branch) is quite heavily tied to lustre. I
think Cray had started porting the code to use VFS file handles instead
of lustre FIDs to make it easier to use but that never quite finished.
OTOH, robinhood v4 has no adherence to lustre, but is still work in
progress. Quentin in Cc has some proof of concept at ingesting
changelogs.
It has been designed with me in mind so should be much easier to
integrate in there (the lustre portion just converts changelogs to a
robinhood-specific 'fsevents' format which is then injected, so there
would be just that fanotify->fsevents conversion to do), but it is still
very young and doesn't have all the features of v3 so might be less
adapted for a champion project.

I'm not sure what to advise on there, from what I'm reading of watchman
it would probably be easier to integrate with than robinhood v3 for
sure, so if you want code to go into a currently-running version it
might be easier to go with that.
(if you do want to do the work for robinhood v3 though I think we would
be happy to integrate the change even with v4 underway, but I am not
responsible for that so cannot make promises; we'd probably be happier
with v4 as a target in the long run)


> I couldn't find the documentation for Lustre Changelog format, because
> the name of the feature is not very Google friendly.
> But looking at the robinhood source code, the direction we are going
> with fanotify seems to be consistent with the designs of Lustre Changelog.
> 
> I am including some snippets from robinhood  chglog_reader.c
> that Jan may find interesting:
> 
> #define PFID(_pid) (_pid)->fs_key, (_pid)->inode
> #define CL_NAME_ARG(_rec_) PFID(&(_rec_)->cr_pfid), (_rec_)->cr_namelen, \
>         rh_get_cl_cr_name(_rec_)
> #define CL_EXT_FORMAT   "s="DFID" sp="DFID" %.*s"
> #elif defined(HAVE_CHANGELOG_EXTEND_REC)
>         if (fid_is_sane(&rec->cr_sfid)) {
>             len = snprintf(curr, left, " " CL_EXT_FORMAT,
>                            PFID(&rec->cr_sfid),
>                            PFID(&rec->cr_spfid),
>                            changelog_rec_snamelen((CL_REC_TYPE *)rec),
>                            changelog_rec_sname((CL_REC_TYPE *)rec));
> 
>     /* parent id is always set when name is (Cf. comment in lfs.c) */
> 
>             /* Ensure compatibility with older Lustre versions:
>              * push RNMFRM to remove the old path from NAMES table.
>              * push RNMTO to add target path information.
>              */
> 
> It looks like the Lustre change record "extended" format is on par with
> the information that the fanotify name info events that patches v3 [1]
> are providing for events "on child" (e.g FAN_MODIFY).

Here are a few example (logs of) changelogs so you get an idea; but it
looks like you understood this correctly (filenames and jobnames
retracted for privacy; we don't actually use the jobnames for robinhood
itself)
2020/05/28 03:49:26 [383/2] fsname-MDT0001: 62545421787 13TRUNC 1590630534.494881281 0xe t=[0xcc005c7aa:0x12cf7:0x0] J=jobname
2020/05/28 03:49:26 [383/2] fsname-MDT0001: 62545421788 11CLOSE 1590630534.495782850 0x43 t=[0xcc005c7aa:0x12cf7:0x0] J=jobname
2020/05/28 03:49:26 [383/2] fsname-MDT0001: 62545422212 01CREAT 1590630535.038294162 0x0 t=[0xcc0056422:0x1e071:0x0] p=[0xcc0056422:0x1e005:0x0] filename J=jobname
2020/05/28 03:49:26 [383/2] fsname-MDT0001: 62545448338 08RENME 1590630550.659753428 0x0 t=[0:0x0:0x0] p=[0xcc005f145:0x12da:0x0] filename_from s=[0xcc00600d7:0xe0:0x0] sp=[0xcc005f145:0x12da:0x0] filename_to J=jobname
2020/05/28 03:49:26 [383/2] fsname-MDT0001: 62545449617 06UNLNK 1590630551.756078437 0x1 t=[0xcc005ff7f:0x42bd:0x0] p=[0xcc0057c27:0x1dc1d:0x0] filename J=jobname
2020/05/28 03:49:58 [383/2] fsname-MDT0001: 62545494822 14SATTR 1590630574.376208143 0x14 t=[0xcc005f9a4:0xa9a6:0x0] J=jobname
2020/05/28 03:51:02 [383/2] fsname-MDT0001: 62545616687 02MKDIR 1590630648.489224641 0x0 t=[0xcc0045fd0:0x8e4b:0x0] p=[0xcc0036d90:0x1:0x0] 

So you always have object fid being acted on, and (parent fid + name
component) for source and destination if they matter (e.g. setattr won't
have any name, but create will have one, and rename both)


> It is not clear to me if Lustre Changelog provides the "extended"
> record for create/rename/delete, but we were not planning to do that.

Ok.

> There is one critical difference between a changelog and fanotify events.
> fanotify events are delivered a-synchronically and may be delivered out
> of order, so application must not rely on path information to update
> internal records without using fstatat(2) to check the actual state of the
> object in the filesystem.

lustre changelogs are asynchronous but the order is guaranteed so we
might rely on that for robinhood v4, but full path is not computed from
information in the changelogs. Instead the design plan is to have a
process scrub the database for files that got updated since the last
path update and fix paths with fstatat, so I think it might work ; but
that unfortunately hasn't been implemented yet.
(so db update would be done in multiple steps; but it should also be
possible to supplement informations in the pipeline, because lustre
changelogs doesn't have size etc which are in the db and that might also
be able to take care of path updates; I guess both models should work
for fanotify since the stat itself is synchronous and you can get path
from /proc/self/fd/x on local filesystems (it doesn't work on lustre;
there's a fid2path helper though))

robinhood v3 systematically does a stat and recomputes path from fid.

> For that reason, we defined the FAN_DIR_MODIFY event, which carries
> info of parent fid and name that can be used for fstatat(2).
> As of yesterday, FAN_DIR_MODIFY is disabled in master, so will not be
> available in v5.7. We are planning to re-able it in the future with an
> appropriate fanotify_init(2) flag for reporting file names.

Yes that started this thread :)
I'm happy to run tests with a custom branch if you need to; we run rhel
kernels normally so would need to recompile anyway.

Thanks!
-- 
Dominique



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux