On Wed, Mar 04, 2020 at 05:45:16PM +0000, David Howells wrote: > Hi Paul, > > As part of trying to implement a filesystem information querying system call, > I need to be able to return a list of child mounts of a mount. Children, > however, may be moved with "mount --move", which means that the list can't be > accessed with normal RCU practices. > > For reference, I'm dealing with struct mount and mnt_mounts and mnt_child and > struct mount is released via call_rcu(). > > What does rcu_dereference() guarantee, exactly? It's just that the next hop > is set up when you follow the pointer, right? > > Can I do something like the attached? The mount 'm' is pinned, but I need to > trawl m->mnt_mounts. mount_lock is a seqlock that ticks when mount topology > is rearranged. I *think* it covers ->mnt_mounts. I took a quick but non-exhaustive look, and didn't find any exceptions. However, a stress test with lockdep enabled and with the addition of appropriate lockdep assertions might be helpful here. I trust lockdep quite a bit more than I trust my quick looks. ;-) > Whilst trawling in > non-locked mode, I keep an eye on the seq counter and if it changes, the list > may have been altered and I need to get the real lock and restart. > > The objects shouldn't disappear or be destroyed, so I think I'm safe. Famous last words! ;-) Huh. The mount structure isn't suffering from a shortage of list_head structures, is it? So the following can happen, then? o The __attach_mnt() function adds a struct mount to its parent list, but in a non-RCU manner. Unless there is some other safeguard, the list_add_tail() in this function needs to be list_add_tail_rcu(). o I am assuming that the various non-RCU traversals that I see, for example, next_mnt(), are protected by lock_mount_hash(). Especially skip_mnt_tree(), which uses mnt_mounts.prev. (I didn't find any exceptions, but I don't claim an exhaustive search.) o The umount_tree() function's use of list_del_init() looks like it could trap an RCU reader in the newly singular list formed by the removal. It appears that there are other functions that use list_del_init() on this list, though I cannot claim any sort of familiarity with this code. So, do you need to add a check for child->mnt_child being in this self-referential state within fsinfo_generic_mount_children()? Plus list_del_init() doesn't mark its stores, though some would argue that unmarked stores are OK in this situation. o There might be other operations in need of RCU-ification. Maybe the list_add_tail() in umount_tree(), but it is not immediately clear that this is adding a new element instead of re-inserting an element already exposed to readers. > Thanks, > David > --- > > int fsinfo_generic_mount_children(struct path *path, struct fsinfo_context *ctx) > { > struct fsinfo_mount_child record = {}; > struct mount *m, *child; > int seq = 0; > > m = real_mount(path->mnt); What keeps the thing referenced by "m" from going away? Presumably the caller has nailed it down somehow, but I have to ask... > rcu_read_lock(); > do { > ctx->usage = 0; > read_seqbegin_or_lock(&mount_lock, &seq); Aside: If there was an updater holding the lock along with a flood of readers, everyone would hereafter acquire the lock. Or am I missing a trick here? (I would have expected it to try a few times, and only then acquire the lock. Yeah, I know, more state would be required.) > list_for_each_entry_rcu(child, &m->mnt_mounts, mnt_child) { > if (need_seqretry(&mount_lock, seq)) > break; > if (child->mnt_parent != m) > continue; > record.mnt_unique_id = child->mnt_unique_id; > record.mnt_id = child->mnt_id; > record.notify_sum = 0; > #ifdef CONFIG_SB_NOTIFICATIONS > record.notify_sum += > atomic_read(&child->mnt.mnt_sb->s_change_counter) + > atomic_read(&child->mnt.mnt_sb->s_notify_counter); > #endif > #ifdef CONFIG_MOUNT_NOTIFICATIONS > record.notify_sum += > atomic_read(&child->mnt_attr_changes) + > atomic_read(&child->mnt_topology_changes) + > atomic_read(&child->mnt_subtree_notifications); > #endif > store_mount_fsinfo(ctx, &record); > } > } while (need_seqretry(&mount_lock, seq)); > done_seqretry(&mount_lock, seq); > > rcu_read_unlock(); > > /* End the list with a copy of the parameter mount's details so that > * userspace can quickly check for changes. > */ > record.mnt_unique_id = m->mnt_unique_id; > record.mnt_id = m->mnt_id; > record.notify_sum = 0; > #ifdef CONFIG_SB_NOTIFICATIONS > record.notify_sum += > atomic_read(&m->mnt.mnt_sb->s_change_counter) + > atomic_read(&m->mnt.mnt_sb->s_notify_counter); > #endif > #ifdef CONFIG_MOUNT_NOTIFICATIONS > record.notify_sum += > atomic_read(&m->mnt_attr_changes) + > atomic_read(&m->mnt_topology_changes) + > atomic_read(&m->mnt_subtree_notifications); > #endif > store_mount_fsinfo(ctx, &record); > return ctx->usage; > } Other than that, looks legit! ;-) Thanx, Paul