Re: [RFC] fsnotify: allow sleepable child dentry flag update

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes:
> On Thu, Oct 13, 2022 at 03:27:19PM -0700, Stephen Brennan wrote:
>
>> So I have two more narrowly scoped strategies to improve the situation. Both are
>> included in the hacky, awful patch below.
>> 
>> First, is to let __fsnotify_update_child_dentry_flags() sleep. This means nobody
>> is holding the spinlock for several seconds at a time. We can actually achieve
>> this via a cursor, the same way that simple_readdir() is implemented. I think
>> this might require moving the declaration of d_alloc_cursor() and maybe
>> exporting it. I had to #include fs/internal.h which is not ok.
>
> Er...  Won't that expose every filesystem to having to deal with cursors?
> Currently it's entirely up to the filesystem in question and I wouldn't
> be a dime on everyone being ready to cope with such surprises...

Hi Al,

I wanted to follow-up on this. Yeah, I didn't realize that this could
cause issues for some filesystems when I wrote it, since I hadn't
considered that rather few filesystems use dcache_readdir(), and so most
aren't prepared to encounter a cursor. Thanks for that catch.

I think I came up with a better solution, which you can see in context
in v3 [1]. The only change I have from the posting there is to eliminate
the unnecssary "child->d_parent != parent" check. So the new idea is to
take a reference to the current child and then go to sleep. That alone
should be enough to prevent dentry_kill() from removing the child from
its parent's list. However, it wouldn't prevent d_move() from moving it
to a new list, nor would it prevent it from moving along the list if it
were a cursor. However, in this situation, we also hold the parent
i_rwsem in write mode, which means that no concurrent rename or readdir
can happen. You can see my full analysis here [2]. So here's the new
approach, if you can call out anything I've missod or confirm that it's
sound, I'd really appreciate it!


/* Must be called with inode->i_rwsem in held in write mode */
static bool __fsnotify_update_children_dentry_flags(struct inode *inode)
{
	struct dentry *child, *alias, *last_ref = NULL;
	alias = d_find_any_alias(inode);

	/*
	 * These lists can get very long, so we may need to sleep during
	 * iteration. Normally this would be impossible without a cursor,
	 * but since we have the inode locked exclusive, we're guaranteed
	 * that the directory won't be modified, so whichever dentry we
	 * pick to sleep on won't get moved.
	 */
	spin_lock(&alias->d_lock);
	list_for_each_entry(child, &alias->d_subdirs, d_child) {
		if (need_resched()) {
			/*
			 * We need to hold a reference while we sleep. But when
			 * we wake, dput() could free the dentry, invalidating
			 * the list pointers. We can't look at the list pointers
			 * until we re-lock the parent, and we can't dput() once
			 * we have the parent locked. So the solution is to hold
			 * onto our reference and free it the *next* time we drop
			 * alias->d_lock: either at the end of the function, or
			 * at the time of the next sleep.
			 */
			dget(child);
			spin_unlock(&alias->d_lock);
			dput(last_ref);
			last_ref = child;
			cond_resched();
			spin_lock(&alias->d_lock);
		}

		if (!child->d_inode)
			continue;

		spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
		if (watched)
			child->d_flags |= DCACHE_FSNOTIFY_PARENT_WATCHED;
		else
			child->d_flags &= ~DCACHE_FSNOTIFY_PARENT_WATCHED;
		spin_unlock(&child->d_lock);
	}
	spin_unlock(&alias->d_lock);
	dput(last_ref);
	dput(alias);
	return watched;
}

Thanks,
Stephen

[1]: https://lore.kernel.org/linux-fsdevel/20221028001016.332663-4-stephen.s.brennan@xxxxxxxxxx/
[2]: https://lore.kernel.org/linux-fsdevel/874jvigye9.fsf@xxxxxxxxxx/



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux