On Mon 06-11-17 11:25:34, Luis R. Rodriguez wrote: > On Tue, Oct 17, 2017 at 7:12 AM, Theodore Ts'o <tytso@xxxxxxx> wrote: > > On Tue, Oct 17, 2017 at 11:20:17AM +0200, Jan Kara wrote: > >> The operation we are speaking about here is different. It is more along the > >> lines of "release this device". And in the current world of containers, > >> mount namespaces, etc. it is not trivial for userspace to implement this > >> using umount(2) as Ted points out. I believe we could do that by walking > >> through all mount points of a superblock and unmounting them (and I don't > >> want to get into a discussion how to efficiently implement that now but in > >> principle the kernel has all the necessary information). > > > > Yes, this is what I want. And regardless of how efficiently or not > > the kernel can implement such an operatoin, by definition it will be > > more efficient than if we have to do it in userspace. > > It seems most folks agree we could all benefit from this, to help > userspace with a sane implementation. > > >> What I'm a bit concerned about is the "release device reference" part - for > >> a block device to stop looking busy we have to do that however then the > >> block device can go away and the filesystem isn't prepared to that - we > >> reference sb->s_bdev in lots of places, we have buffer heads which are part > >> of bdev page cache, and probably other indirect assumptions I forgot about > >> now. > > Is this new operation really the only place where such type of work > could be useful for, or are there existing uses cases this sort of > functionality could also be used for? The functionality of being able to "invalidate" open file descriptor so that it no longer points to the object it used to is useful also for other cases I guess... > For instance I don't think we do something similar to revokefs(2) (as > described below) when a devices has been removed from a system, you > seem to suggest we remove the dev from gendisk leaving it dangling and > invisible. But other than this, it would seem its up to the filesystem > to get anything else implemented correctly? Yes, that's the current situation. When the device is yanked from under a filesystem the current implementation makes it relatively straightforward from fs POV - for all fs cares about the underlying device still exists. It just returns errors for any IO done to it. It is upto fs implementation to deal with it and be able to shutdown itself correctly in such case. > > This all doesn't have to be a single system call. Perhaps it would > > make sense for first and second step to be one system call --- call it > > revokefs(2), perhaps. And then the last step could be another system > > call --- maybe umountall(2). > > Wouldn't *some* part of this also help *enhance* filesystem suspend / > thaw be used on system suspend / resume as well? > > If I may, if we split these up, into two, say revokefs(2) and > umountall(2), how about: > > a) revokefs(2): ensures all file descriptors for the fs are closed > - blocks access attempts high up in VFS > - point any file descriptor to a revoked null struct file > - redirect any task struct CWD's so as if the directory had rmmdir'd > - munmap any mapped regions > > Of these only the first one seems useful for fs suspend? If you reference "blocks access attempts high up in VFS" that already happens for writes when you freeze the filesystem. Also suspend is different in that userspace is already frozen when you get to freezing filesystems so you care only about in-kernel users and there you do not have standard set of entry points anyway... So I don't see much crossection with system suspend here. > > b) umountall(2): properly unmounts filesystem from all namespaces > - May need to verify if revokefs(2) was called, if so, now that all > file descriptors should > be closed, do syncfs() to force out any dirty pages IMHO it doesn't need to verify this. The unmount will just fail if someone is still using some fs. > - unmount() in all namespaces, this takes care of any buffer or page > cache reference once the ref count of the struct super block goes to > to zero Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html