On Tue, Oct 17, 2017 at 11:20:17AM +0200, Jan Kara wrote: > The operation we are speaking about here is different. It is more along the > lines of "release this device". And in the current world of containers, > mount namespaces, etc. it is not trivial for userspace to implement this > using umount(2) as Ted points out. I believe we could do that by walking > through all mount points of a superblock and unmounting them (and I don't > want to get into a discussion how to efficiently implement that now but in > principle the kernel has all the necessary information). Yes, this is what I want. And regardless of how efficiently or not the kernel can implement such an operatoin, by definition it will be more efficient than if we ahve to do it in userspace. (And I don't think it has to be super-efficient, since this is not a hot-path. So for the record, I wouldn't want to add any extra linked list references, etc.) > What I'm a bit concerned about is the "release device reference" part - for > a block device to stop looking busy we have to do that however then the > block device can go away and the filesystem isn't prepared to that - we > reference sb->s_bdev in lots of places, we have buffer heads which are part > of bdev page cache, and probably other indirect assumptions I forgot about > now. One solution to this is to not just stop accessing the device but > truly cleanup the filesystem up to a point where it is practically > unmounted. I like this solution more but we have to be careful to block > any access attemps high enough in VFS ideally before ever entering fs code. Right, so first step would be to block access attempts high up in the VFS. The second would be to point any file descriptors at a revoked NULL struct file, also redirect any task struct's CWD so it is as if the directory had gotten rmdir'ed, and also munmap any mapped regions. At that point, all of the file descriptors will be closed. The third step would be to do a syncfs(), which will force out any dirty pages. And then finally, to call umount() in all of the namespaces, which will naturally take care of any buffer or page cache references once the ref count of the struct super goes to zero. This all doesn't have to be a single system call. Perhaps it would make sense for first and second step to be one system call --- call it revokefs(2), perhaps. And then the last step could be another system call --- maybe umountall(2). > Another option would be to do something similar to what we do when the > device just gets unplugged under our hands - we detach bdev from gendisk, > leave it dangling and invisible. But we would still somehow have to > convince DM that the bdev practically went away by calling > disk->fops->release() and it all just seems fragile to me. But I wanted to > mention this option in case the above solution proves to be too difficult. Yeah, that's similarly as fragile as using the ext4/xfs/f2fs shutdown/goingdown ioctl. In order to do this right I really think we need to get the VFS involved, so it can be a real, clean unmount, as opposed to something where we just rip the file system away from the bdev. - Ted -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>