Re: [PATCH review 11/11] mnt: Honor MNT_LOCKED when detaching mounts

ebiederm@xxxxxxxxxxxx (Eric W. Biederman) · Fri, 09 Jan 2015 15:30:10 -0600

Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes:

> On Thu, Jan 08, 2015 at 10:32:13PM +0000, Al Viro wrote:
>> 	a) easily expressed with that scheme (behaviour is almost identical
>> to what your series does, except that we'd empty mnt_child right in
>> mntput_no_expire(), rather than keeping it until cleanup_mnt()) and
>> 	b) easily generalized to MNT_DETACH with lazy dissolving; in fact,
>> the only difference would be to treat *all* submounts as "don't put on
>> global list, don't remove from child list", not just the MNT_LOCKED ones.

I have no problem with treating all mounts whose parents are also being
unmounted as "don't put on global list, don't remove from child list"

Replacing the mnt_ex_mountpoint struct path with a fs_pin seems
reasonable.

We need to be careful with mounts that have parents that are not being
unmounted, with respect to the mount hash table but that just looks like
a detail in the overall scheme of things.

And if delaying the disolution of mounts in the mount detach case is a
long held wishlist item I am all for going there.  I have a patch I was
playing with that did that, I just didn't include it because that is a
bug fix kind of thing.

>> I'll play around with that today and tomorrow; hopefully, I'll have a postable
>> variant by the weekend...
>
> Hmm...  Linus, what do you think of the following?
>
> struct foo {
> 	int done;
> 	wait_queue_head_t wait;
> 	...
> };
>

Where does the rcu_read_lock() happen?
I assume this is kill from fs_pin.kill?

> void kill_foo(struct foo *p)
> {
>         wait_queue_t wait;
> 	wait.flags = WQ_FLAG_EXCLUSIVE;
> 	wait.private = current;
> 	wait.func = autoremove_wake_function;
> 	spin_lock_irq(&p->wait.lock);
> 	if (likely(!p->done)) {
> 		p->done = -1;
> 		spin_unlock_irq(&p->wait.lock);
> 		rcu_read_unlock();
> 		/* do cleanup */
> 		...
> 		/* remove references to *p */
> 		...
> 		spin_lock_irq(&p->wait.lock);
> 		p->done = 1;	
> 		wake_up_locked(&p->wait);
> 		spin_unlock_irq(&p->wait.lock);
> 		/* RCU-schedule freeing of p */
> 		...
> 		return;
> 	}
> 	if (p->done > 0) {
> 		spin_unlock_irq(&p->wait.lock);
> 		rcu_read_unlock();
> 		return;
> 	}
> 	__add_wait_queue_tail(&p->wait, &wait);
> 	while (1) {
> 		set_current_state(TASK_UNINITERRUPTIBLE);
> 		spin_unlock_irq(&p->wait.lock);
> 		rcu_read_unlock();
> 		schedule();
> 		rcu_read_lock();
> 		if (likely(list_empty(&wait.task_list)))
> 			break;
> 		/* OK, we know p couldn't have been freed yet */
> 		spin_lock_irq(&p->wait.lock);
> 		if (p->done > 0) {
> 			spin_unlock_irq(&p->wait.lock);
> 			break;
> 		}
> 	}
> 	rcu_read_unlock();
> }
>
> AFAICS, that ought to be safe - the first caller makes sure that everybody
> else will wait for it to finish, everybody else (ones who come via references
> before the first one has removed them) will be waiting for the first one
> to do wakeup *and* will take care not to touch the victim unless they knows
> it's still there.
>
> Do you see any problems with the above?  I wonder if I ended up open-coding
> something already existing in there...

Your struct foo looks an awful lot like struct completion at first
glance.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html