Quoting Miklos Szeredi (miklos@xxxxxxxxxx): > > > + t->table[0].mode = 0644; > > > > Yikes, this could be a problem for containers, as it's simply tied to > > uid 0, whereas tying it to a capability would let us solve it with > > capability bounds. > > > > This might mean more urgency to get user namespaces working at least > > with sysfs, else this is a quick way around having CAP_SYS_ADMIN taken > > out of a container's capability bounding set. > > I think I understand the problem, but not the solution. How do user > namespaces going to help? Well it somewhat depends on how we implement userns for filesystems in the first place, and whether we end up splitting sysfs into sub-filesystems as I think Eric Biederman has been advocating. My thoughts had been running along the lines of just tagging vfsmounts with userns of the mounting process. A task from outside the mounting process' namespace would get user other permissions whether or not its uid was the owning uid or uid 0 (unless the task had CAP_NS_OVERRIDE). But really it gets more complicated for sysfs than something like ext2 since we really want to be able to filter files and directories for different namespaces... Handling sysfs user namespaces before we sort out the rest of the sysfs stuff (being hashed out with network namespaces) seems like jumping the gun a bit. > Maybe sysctls just need to check capabilities, instead of uids. I > think that would make a lot of sense anyway. Would it be as simple as tagging the inodes with capability sets? One set for writing, or one each for reading and writing? thanks, -serge - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html