On Mon, Jun 10, 2024 at 02:46:06AM -0700, Jonathan Calmels wrote: > On Sun, Jun 09, 2024 at 09:33:01PM GMT, Serge E. Hallyn wrote: > > On Sun, Jun 09, 2024 at 03:43:35AM -0700, Jonathan Calmels wrote: > > > This patch adds a new capability security bit designed to constrain a > > > task’s userns capability set to its bounding set. The reason for this is > > > twofold: > > > > > > - This serves as a quick and easy way to lock down a set of capabilities > > > for a task, thus ensuring that any namespace it creates will never be > > > more privileged than itself is. > > > - This helps userspace transition to more secure defaults by not requiring > > > specific logic for the userns capability set, or libcap support. > > > > > > Example: > > > > > > # capsh --secbits=$((1 << 8)) --drop=cap_sys_rawio -- \ > > > -c 'unshare -r grep Cap /proc/self/status' > > > CapInh: 0000000000000000 > > > CapPrm: 000001fffffdffff > > > CapEff: 000001fffffdffff > > > CapBnd: 000001fffffdffff > > > CapAmb: 0000000000000000 > > > CapUNs: 000001fffffdffff > > > > But you are not (that I can see, in this or the previous patch) > > keeping SECURE_USERNS_STRICT_CAPS in securebits on the next > > level unshare. Though I think it's ok, because by then both > > cap_userns and cap_bset are reduced and cap_userns can't be > > expanded. (Sorry, just thinking aloud here) > > Right this is safe to reset, but maybe we do keep it if the secbit is > locked? This is kind of a special case compared to the other bits. I don't think it would be worth the extra complication in the secbits code, and it's semantically very different from the cap_userns. > > > + /* Limit userns capabilities to our parent's bounding set. */ > > > > In the case of userns_install(), it will be the target user namespace > > creator's bounding set, right? Not "our parent's"? > > Good point, I should reword this comment.