Re: [PATCH 06/25] fsnotify: Attach marks to object via dedicated head structure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon 06-02-17 20:44:45, Miklos Szeredi wrote:
> On Wed, Feb 1, 2017 at 11:44 AM, Jan Kara <jack@xxxxxxx> wrote:
> > Currently notification marks are attached to object (inode or vfsmnt) by
> > a hlist_head in the object. The list is also protected by a spinlock in
> > the object. So while there is any mark attached to the list of marks,
> > the object must be pinned in memory (and thus e.g. last iput() deleting
> > inode cannot happen). Also for list iteration in fsnotify() to work, we
> > must hold fsnotify_mark_srcu lock so that mark itself and
> > mark->obj_list.next cannot get freed. Thus we are required to wait for
> > response to fanotify events from userspace process with
> > fsnotify_mark_srcu lock held. That causes issues when userspace process
> > is buggy and does not reply to some event - basically the whole
> > notification subsystem gets eventually stuck.
> >
> > So to be able to drop fsnotify_mark_srcu lock while waiting for
> > response, we have to pin the mark in memory and make sure it stays in
> > the object list (as removing the mark waiting for response could lead to
> > lost notification events for groups later in the list). However we don't
> > want inode reclaim to block on such mark as that would lead to system
> > just locking up elsewhere.
> >
> > This commit tries to pave a way towards solving these conflicting
> > lifetime needs. Instead of anchoring the list of marks directly in the
> > object, we anchor it in a dedicated structure (fsnotify_mark_connector)
> > and just point to that structure from the object. In the following patch
> > we will also protect the list by a spinlock contained in that structure.
> > With this, we will be able to detach notification marks from object
> > without having to modify the list itself.
> 
> This is still a bit big to review easily.  I'd suggest further splitup into:
> 
> - move hlist_head from inode/mnt into connector
> - move inode/mnt ptr from mark into connector (no refcounting changes)
> - inode refcounting cleanup

OK, I got back to this after sorting out some more urgent stuff with
BDIs... I did the split and the result really looks more manageable. Thanks
for suggestion!

> > diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
> > index 5a4ec309e283..3dd2e0ece262 100644
> > --- a/fs/notify/dnotify/dnotify.c
> > +++ b/fs/notify/dnotify/dnotify.c
> > @@ -69,8 +69,8 @@ static void dnotify_recalc_inode_mask(struct fsnotify_mark *fsn_mark)
> >         if (old_mask == new_mask)
> >                 return;
> >
> > -       if (fsn_mark->inode)
> > -               fsnotify_recalc_inode_mask(fsn_mark->inode);
> > +       if (fsn_mark->connector->inode
> 
> This doesn't make sense in the context of this patch.  Now
> mark->connector is set to NULL where previously mark->inode was set to
> null.   So the right condition would be "if (fsn_mark->connector)
> ...".

Right, I'll fix that.

> > diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> > index b41515d3f081..421c7431a16e 100644
> > --- a/fs/notify/fsnotify.c
> > +++ b/fs/notify/fsnotify.c
> > @@ -193,6 +193,7 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
> >         struct hlist_node *inode_node = NULL, *vfsmount_node = NULL;
> >         struct fsnotify_mark *inode_mark = NULL, *vfsmount_mark = NULL;
> >         struct fsnotify_group *inode_group, *vfsmount_group;
> > +       struct fsnotify_mark_connector *inode_conn, *vfsmount_conn;
> >         struct mount *mnt;
> >         int idx, ret = 0;
> >         /* global tests shouldn't care about events on child only the specific event */
> > @@ -210,8 +211,8 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
> >          * SRCU because we have no references to any objects and do not
> >          * need SRCU to keep them "alive".
> >          */
> > -       if (hlist_empty(&to_tell->i_fsnotify_marks) &&
> > -           (!mnt || hlist_empty(&mnt->mnt_fsnotify_marks)))
> > +       if (!to_tell->i_fsnotify_marks &&
> > +           (!mnt || !mnt->mnt_fsnotify_marks))
> 
> The checks against hlist_empty() could still be useful to optimize
> away the memory barrier imposed by srcu_read_lock() in case there were
> marks on the object but not anymore (i.e. the connector is non-NULL,
> but the hlist is empty).

Yes, at this point in the series it would be a worthwhile optimization. But
since following patches change the rules so that empty connector structures
get discarded, I don't think making the optimization in this patch is worth
it.

> >                 return 0;
> >         /*
> >          * if this is a modify event we may need to clear the ignored masks
> > @@ -226,16 +227,24 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
> >         idx = srcu_read_lock(&fsnotify_mark_srcu);
> >
> >         if ((mask & FS_MODIFY) ||
> > -           (test_mask & to_tell->i_fsnotify_mask))
> > -               inode_node = srcu_dereference(to_tell->i_fsnotify_marks.first,
> > -                                             &fsnotify_mark_srcu);
> > +           (test_mask & to_tell->i_fsnotify_mask)) {
> > +               inode_conn = to_tell->i_fsnotify_marks;
> 
> Needs lockless_dereference().

Right. Will fix.

> > diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h
> > index 0a3bc2cf192c..7a95436dc8d3 100644
> > --- a/fs/notify/fsnotify.h
> > +++ b/fs/notify/fsnotify.h
...
> >  /* run the list of all marks associated with inode and destroy them */
> >  static inline void fsnotify_clear_marks_by_inode(struct inode *inode)
> >  {
> > +       if (!inode->i_fsnotify_marks)
> > +               return;
> 
> Should move the above check inside fsnotify_destroy_marks().  Locking
> rules also dictate that the check be done under inode lock, although
> that may not matter in practice.

Yeah, it does not matter in practice but it's just nicer. Moved (and the
other similar occurences as well).

> > diff --git a/fs/notify/mark.c b/fs/notify/mark.c
> > index 44836e539169..ce2314ae96d0 100644
> > --- a/fs/notify/mark.c
> > +++ b/fs/notify/mark.c
...
> > @@ -249,6 +252,16 @@ void fsnotify_destroy_marks(struct hlist_head *head, spinlock_t *lock)
> >                 fsnotify_destroy_mark(mark, mark->group);
> >                 fsnotify_put_mark(mark);
> >         }
> > +       /*
> > +        * Skip freeing of connector when inode is just unlinked. We leave that
> > +        * until destroy_inode() is called at which moment nobody will be
> > +        * looking at the inode anymore.
> > +        */
> > +       if ((*connp)->flags & FSNOTIFY_OBJ_TYPE_INODE &&
> > +           !((*connp)->inode->i_state & I_CLEAR))
> > +               return;
> 
> This is ugly.   Why not rather add a fsnotify_connector_free() and
> call that explicitly on inode/vfsmount destruction?

Good idea. Changed.

> > +static int fsnotify_add_mark_list(struct fsnotify_mark *mark,
> > +                                 struct inode *inode, struct vfsmount *mnt,
> > +                                 int allow_dups)
> >  {
> >         struct fsnotify_mark *lmark, *last = NULL;
> > +       struct fsnotify_mark_connector *conn;
> > +       struct fsnotify_mark_connector **connp;
> >         int cmp;
> > +       int err = 0;
> > +
> > +       BUG_ON(!inode && !mnt);
> 
> Use WARN_ON() instead.

OK.

> > +       if (inode)
> > +               connp = &inode->i_fsnotify_marks;
> > +       else
> > +               connp = &real_mount(mnt)->mnt_fsnotify_marks;
> > +restart:
> > +       spin_lock(&mark->lock);
> > +       conn = fsnotify_grab_connector(*connp);
> 
> This looks a bit too subtle.  We have the inode/vfsmount.  We have the
> matching lock.  Could store the address of the lock in a local
> variable and use that for locking.

Done.

> > +       if (!conn) {
> > +               spin_unlock(&mark->lock);
> > +               conn = kmem_cache_alloc(fsnotify_mark_connector_cachep,
> > +                                       GFP_KERNEL);
> > +               if (!conn)
> > +                       return -ENOMEM;
> > +               INIT_HLIST_HEAD(&conn->list);
> > +               if (inode) {
> > +                       conn->flags = FSNOTIFY_OBJ_TYPE_INODE;
> > +                       conn->inode = inode;
> > +               } else {
> > +                       conn->flags = FSNOTIFY_OBJ_TYPE_VFSMOUNT;
> > +                       conn->mnt = mnt;
> > +               }
> > +               if (cmpxchg(connp, NULL, conn)) {
> > +                       /* Someone else created list structure for us */
> > +                       kmem_cache_free(fsnotify_mark_connector_cachep, conn);
> > +               }
> 
> Setting *connp should be done with inode/vfsmount lock already
> regained (if I understand the locking rules correctly) and then no
> need to have cmpxchg magic in there.

Yes, it will have to come in following patches but I agree it is not needed
here. Changed.

> > diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> > index 487246546ebe..01df00327683 100644
> > --- a/include/linux/fsnotify_backend.h
> > +++ b/include/linux/fsnotify_backend.h
> > @@ -194,6 +194,27 @@ struct fsnotify_group {
> >  #define FSNOTIFY_EVENT_INODE   2
> >
> >  /*
> > + * Inode / vfsmount point to this structure which tracks all marks attached to
> > + * the inode / vfsmount. The reference to inode / vfsmount is held by this
> > + * structure. We destroy this structure when there are no more marks attached
> > + * to it. The structure is protected by fsnotify_mark_srcu.
> > + */
> > +struct fsnotify_mark_connector {
> > +#define FSNOTIFY_OBJ_TYPE_INODE                0x01
> > +#define FSNOTIFY_OBJ_TYPE_VFSMOUNT     0x02
> > +       unsigned int flags;     /* Type of object [lock] */
> > +       union { /* Object pointer [lock] */
> > +               struct inode *inode;
> > +               struct vfsmount *mnt;
> > +       };
> > +       union {
> > +               struct hlist_head list;
> > +               /* Used listing heads to free after srcu period expires */
> > +               struct fsnotify_mark_connector *destroy_next;
> 
> This field is not used in this patch at all.

Yes, will move to later patches.

> > diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> > index 11c7ac441624..a878a98f55b3 100644
> > --- a/kernel/audit_tree.c
> > +++ b/kernel/audit_tree.c
> > @@ -248,7 +263,8 @@ static void untag_chunk(struct node *p)
> >
> >         mutex_lock(&entry->group->mark_mutex);
> >         spin_lock(&entry->lock);
> > -       if (chunk->dead || !entry->inode) {
> > +       if (chunk->dead || !entry->connector ||
> > +           !entry->connector->inode) {
> 
> entry->connector->inode will always be positive, at least in this patch.

Correct. I will move it to a later patch.

> > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > index cf1fa43512c1..e990de915341 100644
> > --- a/kernel/auditsc.c
> > +++ b/kernel/auditsc.c
> > @@ -1591,7 +1591,7 @@ static inline void handle_one(const struct inode *inode)
> >         struct audit_tree_refs *p;
> >         struct audit_chunk *chunk;
> >         int count;
> > -       if (likely(hlist_empty(&inode->i_fsnotify_marks)))
> > +       if (likely(!inode->i_fsnotify_marks))
> 
> Need to check hlist_empty() as well?  Perhaps add an inline helper to
> fsnotify_backend.h: fsnotify_inode_marks_empty().

I've added the hlist_empty() check. I don't think
fsnotify_inode_marks_empty() helper is needed as it will be used only in
these two places.

Thanks for review!

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux