Re: robinhood, fanotify name info events and lustre changelog

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Developer of robinhood v4 here,

> > > [1] https://github.com/cea-hpc/robinhood/

The sources for version 4 live in a separate branch:
https://github.com/cea-hpc/robinhood/tree/v4

Any feedback is welcome =)

I am guessing the most interesting bits for this discussion should be found
here:
https://github.com/cea-hpc/robinhood/blob/v4/include/robinhood/fsevent.h

I am not sure it will matter for the rest of the conversation, but just in case:

    RobinHood v4 has a notion of a "namespace" xattr (like an xattr, but for
    a dentry rather than an inode), it is used it to store things that are only
    really tied to the namespace (like the path of an entry). I don't think this
    is really relevant here, you can probably ignore it.

    Also, RobinHood uses file handles to uniquely identify filesystem entries,
    and this is what is stored in a `struct rbh_id`.

> > I couldn't find the documentation for Lustre Changelog format, because
> > the name of the feature is not very Google friendly.

Yes, this is really unfortunate. For the record, user documentation for Lustre
lives at: http://doc.lustre.org/lustre_manual.xhtml

Chapter 12.1 deals with "Lustre Changelogs" (not much more there than
what Dominique already wrote).

> > There is one critical difference between a changelog and fanotify events.
> > fanotify events are delivered a-synchronically and may be delivered out
> > of order, so application must not rely on path information to update
> > internal records without using fstatat(2) to check the actual state of the
> > object in the filesystem.

> lustre changelogs are asynchronous but the order is guaranteed so we
> might rely on that for robinhood v4,

Yes, we do. At least to a certain extent : we at least expect changelog records
for a single filesystem entry to be emitted in the order they happened on the
FS. I have not really given much thought to how things would work in general
if that wasn't true, but I know this is an issue for things that deal with the
namespace : https://jira.whamcloud.com/browse/LU-12574

> but full path is not computed from
> information in the changelogs. Instead the design plan is to have a
> process scrub the database for files that got updated since the last
> path update and fix paths with fstatat, so I think it might work ; but
> that unfortunately hasn't been implemented yet.

Not exactly (I am not sure it really matters, so I'll try to be brief).

The idea to keep paths in sync with what's in the filesystem is to "tag"
entries as we update their name (ie. after a rename). Then a separate
process comes in, queries for entries that have that "tag", and updates
their path by concatenating their parent's path (if the parents themselves
are not "tagged") with the entries' own, up-to-date name. After that, if
the entry was a directory, its children are "tagged". I simplified a bit, but
that's the idea.

So, to be fair, full paths _are_ computed solely from information in the
changelog records, even though it requires a bit of processing on the side.
No additional query to the filesystem for that.

Cheers,
Quentin



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux