Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes: > On Thu, Jan 08, 2015 at 10:32:13PM +0000, Al Viro wrote: >> a) easily expressed with that scheme (behaviour is almost identical >> to what your series does, except that we'd empty mnt_child right in >> mntput_no_expire(), rather than keeping it until cleanup_mnt()) and >> b) easily generalized to MNT_DETACH with lazy dissolving; in fact, >> the only difference would be to treat *all* submounts as "don't put on >> global list, don't remove from child list", not just the MNT_LOCKED ones. I have no problem with treating all mounts whose parents are also being unmounted as "don't put on global list, don't remove from child list" Replacing the mnt_ex_mountpoint struct path with a fs_pin seems reasonable. We need to be careful with mounts that have parents that are not being unmounted, with respect to the mount hash table but that just looks like a detail in the overall scheme of things. And if delaying the disolution of mounts in the mount detach case is a long held wishlist item I am all for going there. I have a patch I was playing with that did that, I just didn't include it because that is a bug fix kind of thing. >> I'll play around with that today and tomorrow; hopefully, I'll have a postable >> variant by the weekend... > > Hmm... Linus, what do you think of the following? > > struct foo { > int done; > wait_queue_head_t wait; > ... > }; > Where does the rcu_read_lock() happen? I assume this is kill from fs_pin.kill? > void kill_foo(struct foo *p) > { > wait_queue_t wait; > wait.flags = WQ_FLAG_EXCLUSIVE; > wait.private = current; > wait.func = autoremove_wake_function; > spin_lock_irq(&p->wait.lock); > if (likely(!p->done)) { > p->done = -1; > spin_unlock_irq(&p->wait.lock); > rcu_read_unlock(); > /* do cleanup */ > ... > /* remove references to *p */ > ... > spin_lock_irq(&p->wait.lock); > p->done = 1; > wake_up_locked(&p->wait); > spin_unlock_irq(&p->wait.lock); > /* RCU-schedule freeing of p */ > ... > return; > } > if (p->done > 0) { > spin_unlock_irq(&p->wait.lock); > rcu_read_unlock(); > return; > } > __add_wait_queue_tail(&p->wait, &wait); > while (1) { > set_current_state(TASK_UNINITERRUPTIBLE); > spin_unlock_irq(&p->wait.lock); > rcu_read_unlock(); > schedule(); > rcu_read_lock(); > if (likely(list_empty(&wait.task_list))) > break; > /* OK, we know p couldn't have been freed yet */ > spin_lock_irq(&p->wait.lock); > if (p->done > 0) { > spin_unlock_irq(&p->wait.lock); > break; > } > } > rcu_read_unlock(); > } > > AFAICS, that ought to be safe - the first caller makes sure that everybody > else will wait for it to finish, everybody else (ones who come via references > before the first one has removed them) will be waiting for the first one > to do wakeup *and* will take care not to touch the victim unless they knows > it's still there. > > Do you see any problems with the above? I wonder if I ended up open-coding > something already existing in there... Your struct foo looks an awful lot like struct completion at first glance. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html