On Tue, Oct 17, 2017 at 7:12 AM, Theodore Ts'o <tytso@xxxxxxx> wrote: > On Tue, Oct 17, 2017 at 11:20:17AM +0200, Jan Kara wrote: >> The operation we are speaking about here is different. It is more along the >> lines of "release this device". And in the current world of containers, >> mount namespaces, etc. it is not trivial for userspace to implement this >> using umount(2) as Ted points out. I believe we could do that by walking >> through all mount points of a superblock and unmounting them (and I don't >> want to get into a discussion how to efficiently implement that now but in >> principle the kernel has all the necessary information). > > Yes, this is what I want. And regardless of how efficiently or not > the kernel can implement such an operatoin, by definition it will be > more efficient than if we have to do it in userspace. It seems most folks agree we could all benefit from this, to help userspace with a sane implementation. >> What I'm a bit concerned about is the "release device reference" part - for >> a block device to stop looking busy we have to do that however then the >> block device can go away and the filesystem isn't prepared to that - we >> reference sb->s_bdev in lots of places, we have buffer heads which are part >> of bdev page cache, and probably other indirect assumptions I forgot about >> now. Is this new operation really the only place where such type of work could be useful for, or are there existing uses cases this sort of functionality could also be used for? For instance I don't think we do something similar to revokefs(2) (as described below) when a devices has been removed from a system, you seem to suggest we remove the dev from gendisk leaving it dangling and invisible. But other than this, it would seem its up to the filesystem to get anything else implemented correctly? > This all doesn't have to be a single system call. Perhaps it would > make sense for first and second step to be one system call --- call it > revokefs(2), perhaps. And then the last step could be another system > call --- maybe umountall(2). Wouldn't *some* part of this also help *enhance* filesystem suspend / thaw be used on system suspend / resume as well? If I may, if we split these up, into two, say revokefs(2) and umountall(2), how about: a) revokefs(2): ensures all file descriptors for the fs are closed - blocks access attempts high up in VFS - point any file descriptor to a revoked null struct file - redirect any task struct CWD's so as if the directory had rmmdir'd - munmap any mapped regions Of these only the first one seems useful for fs suspend? b) umountall(2): properly unmounts filesystem from all namespaces - May need to verify if revokefs(2) was called, if so, now that all file descriptors should be closed, do syncfs() to force out any dirty pages - unmount() in all namespaces, this takes care of any buffer or page cache reference once the ref count of the struct super block goes to to zero Luis -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html