James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> writes: > On Sat, 2016-05-14 at 10:53 +0100, Djalal Harouni wrote: Just a couple of quick comments from a very high level design point. - I think a shiftfs is valuable in the same way that overlayfs is valuable. Esepcially in the Docker case where a lot of containers want a shared base image (for efficiency), but it is desirable to run those containers in different user namespaces for safety. - It is also the plan to make it possible to mount a filesystem where the uids and gids of that filesystem on disk do not have a one to one mapping to kernel uids and gids. 99% of the work has already be done, for all filesystem except XFS. That said there are some significant issues to work through, before something like that can be enabled. * Handling of uids/gids on disk that don't map into a kuid/kgid. * Safety from poisoned filesystem images. I have slowly been working with Seth Forshee on these issues as the last thing I want is to introduce more security bugs right now. Seth being a braver man than I am has already merged his changes into the Ubuntu kernel. Right now we are targeting fuse, because fuse is already designed to handle poisoned filesystem images. So to safely enable this kind of mapping for fuse is not a giant step. The big thing from my point of view is to get the VFS interfaces correct so that the VFS handles all of the weird cases that come up with uids and gids that don't map, and any other weird cases. Keeping the weird bits out of the filesystems. James, Djalal I regert I have not been able to read through either of your patches cloesely yet. From a high level view I believe there are use cases for both approaches, and the use cases do not necessarily overlap. Djalal I think you are seeing the upsides and not the practical dangers of poisoned filesystem images. James I think you are missing the fact that all filesystems already have the make_kuid and make_kgid calls right where the data comes off disk, and the from_kuid and from_kgid calls right where the on-disk data is being created just before it goes on disk. Which means that the actual impact on filesystems of the translation is trivial. Where the actual impact of filesystems is much higher is the infrastructure needed to ensure poisoned filesystem images do not cause a kernel compromise. That extends to the filesystem testing and code review process beyond and is more than just a kernel problem. Hardening that attack surface of the disk side of filesystems is difficult especially when not impacting filesystem performance. So I don't think it makes sense to frame this as an either/or situation. I think there is a need for both solutions. Djalal if you could work with Seth I think that would be very useful. I know I am dragging my heels there but I really hope I can dig in and get everything reviewed and merged soonish. James if you could see shiftfs with a different set of merits than what to Djalal is doing I think that would be useful. As it would allow everyone to concentrate on getting the bugs out of their solutions. That said I am not certain shiftfs makes sense without Seth's patches to handle the weird cases at the VFS level. What do you do with uids and gids that don't map? You can reinvent how to handle the strange cases in shfitfs or we can work on solving this problem at the VFS level so people don't have to go through the error prone work of reinventing solutions. The big ugly nasty in all of this is that we are fundamentally dealing with uids and gids which are security identifiers. Practically any bug is exploitable and CVE worthy. So it make sense to tread very carefully. Even with care it can takes months if not years to get the number of bugs down to a level where you are not the favorite target of people looking for exploitable kernel bugs. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html