On Sun, Jun 09, 2024 at 09:33:01PM GMT, Serge E. Hallyn wrote: > On Sun, Jun 09, 2024 at 03:43:35AM -0700, Jonathan Calmels wrote: > > This patch adds a new capability security bit designed to constrain a > > task’s userns capability set to its bounding set. The reason for this is > > twofold: > > > > - This serves as a quick and easy way to lock down a set of capabilities > > for a task, thus ensuring that any namespace it creates will never be > > more privileged than itself is. > > - This helps userspace transition to more secure defaults by not requiring > > specific logic for the userns capability set, or libcap support. > > > > Example: > > > > # capsh --secbits=$((1 << 8)) --drop=cap_sys_rawio -- \ > > -c 'unshare -r grep Cap /proc/self/status' > > CapInh: 0000000000000000 > > CapPrm: 000001fffffdffff > > CapEff: 000001fffffdffff > > CapBnd: 000001fffffdffff > > CapAmb: 0000000000000000 > > CapUNs: 000001fffffdffff > > But you are not (that I can see, in this or the previous patch) > keeping SECURE_USERNS_STRICT_CAPS in securebits on the next > level unshare. Though I think it's ok, because by then both > cap_userns and cap_bset are reduced and cap_userns can't be > expanded. (Sorry, just thinking aloud here) Right this is safe to reset, but maybe we do keep it if the secbit is locked? This is kind of a special case compared to the other bits. > > + /* Limit userns capabilities to our parent's bounding set. */ > > In the case of userns_install(), it will be the target user namespace > creator's bounding set, right? Not "our parent's"? Good point, I should reword this comment.