On Mon, 2017-01-16 at 13:39 -0500, Oleg Drokin wrote: > On Jan 16, 2017, at 1:21 PM, James Bottomley wrote: > > > On Mon, 2017-01-16 at 13:02 -0500, Oleg Drokin wrote: > > > On Jan 16, 2017, at 12:32 PM, James Bottomley wrote: > > > > > > > On Sun, 2017-01-15 at 18:38 -0500, Oleg Drokin wrote: > > > > > A container support from filesystems is also very relevant > > > > > to us since Lustre is used more and more in such settings. > > > > > > > > I've added the containers ML to the cc just in case. Can you > > > > add more colour to this, please? What container support for > > > > filesystems do you think we need beyond the user namespace in > > > > the superblock? > > > > > > Namespace access is necessary, we might need it before the > > > superblock is there too (say during mount we might need kerberos > > > credentials fetched to properly authenticate this mount instance > > > to the server). > > > > The superblock namespace is mostly for uid/gid changes across the > > kernel <-> filesystem boundary. > > That's on the kernel<->filesystem, but inside of the FS there might > be other considerations that you might want to attach there. > Say when you are encrypting the traffic to the server you want > to use the right keys. So this is the keyring namespace? It was mentioned at KS, but, as far as I can tell, not discussed in the Containers MC that followed, so I've no idea what the status is. > It's all relatively easy when you have a separate mount there, so > you can store the credentials in the superblock, but we lose on the > cache sharing, for example (I don't know how important that is). It depends what you mean by "cache sharing". If you're thinking of the page cache, then it all just works, provided the underlying inode doesn't change. If you're in the situation where the container orchestration system knows that two files are the same but there's been a change of underlying device (fuse passthrough, say) so the inode is different (the docker double caching problem) and you need some way of forcibly combining them in the page cache, that was discussed a couple of years ago, and Virtuozzo people have patches, but I haven't seen much upstream agreement. > > The actual container namespaces will already be set up by the time > > the mount is done (assuming mount within a container), so you have > > them all present. Usually you get the information for credentials > > from a combination of the UTS namespace (host/domain name) and the > > mount namespace (credentials provisioned to container filesystem). > > Yes, when mounting from a container it's possible to fetch this info > and pass it around, is mounting from outside of the container > important too? mounting from outside the container usually involved entering the container and performing the mount. However the way you enter the container can pull stuff in from outside (like file descriptors). > > Perhaps if you described the actual problem you're seeing rather > > than try to relate it to what I said about superblock namespace > > (which is probably irrelevant), we could figure out what the issue > > is. > > Right now the deployments are simple and we do not have any major > issues (other than certain caching overzealousness that throws cgroup > memory accounting off), but learning what other problems are there in > this space and what we should be looking for. You might need to canvas the other users to see if there is anything viable to discuss. James -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html