On Fri, Dec 01, 2023 at 10:45:05AM -0800, Jakub Kicinski wrote: > On Fri, 1 Dec 2023 10:17:02 -0800 Kees Cook wrote: > > > > -static inline int nla_len(const struct nlattr *nla) > > > > +static inline u16 nla_len(const struct nlattr *nla) > > > > { > > > > - return nla->nla_len - NLA_HDRLEN; > > > > + return nla->nla_len > NLA_HDRLEN ? nla->nla_len - NLA_HDRLEN : 0; > > > > } > > > > > > Note the the NLA_HDRLEN is the length of struct nlattr. > > > I mean of the @nla object that gets passed in as argument here. > > > So accepting that nla->nla_len may be < NLA_HDRLEN means > > > that we are okay with dereferencing a truncated object... > > > > > > We can consider making the return unsinged without the condition maybe? > > > > Yes, if we did it without the check, it'd do "less" damage on > > wrap-around. (i.e. off by U16_MAX instead off by INT_MAX). > > > > But I'd like to understand: what's the harm in adding the clamp? The > > changes to the assembly are tiny: > > https://godbolt.org/z/Ecvbzn1a1 > > Hm, I wonder if my explanation was unclear or you disagree.. > > This is the structure: > > struct nlattr { > __u16 nla_len; // attr len, incl. this header > __u16 nla_type; > }; > > and (removing no-op wrappers): > > #define NLA_HDRLEN sizeof(struct nlattr) > > So going back to the code: > > return nla->nla_len > NLA_HDRLEN ? nla->nla_len - NLA_HDRLEN... > > We are reading nla->nla_len, which is the first 2 bytes of the structure. > And then we check if the structure is... there? I'm not debating whether it's there or not -- I'm saying the _contents_ of "nlattr::nla_len", in the face of corruption or lack of initialization, may be less than NLA_HDRLEN. (There's a lot of "but that's can't happen" that _does_ happen in the kernel, so I'm extra paranoid.) > If we don't trust that struct nlattr which gets passed here is at least > NLA_HDRLEN (4B) then why do we think it's safe to read nla_len (the > first 2B of it)? Type confusion (usually due to Use-after-Free flaws) means that a memory region is valid (i.e. good pointer), but that the contents might have gotten changed through other means. (To see examples of this with struct msg_msg, see: https://syst3mfailure.io/wall-of-perdition/) (On a related note, why does nla_len start at 4 instead of 0? i.e. why does it include the size of nlattr? That seems redundant based on the same logic you're using here.) > That's why I was pointing at nla_ok(). nla_ok() takes the size of the > buffer / message as an arg, so that it can also check if looking at > nla_len itself is not going to be an OOB access. 99% of netlink buffers > we parse come from user space. So it's not like someone could have > mis-initialized the nla_len in the kernel and being graceful is helpful. > > The extra conditional is just a minor thing. The major thing is that > unless I'm missing something the check makes me go 🤨️ My concern is that there are 562 callers of nla_len(): $ git grep '\bnla_len(\b' | wc -l 562 We have no way to be certain that all callers follow a successful nla_ok() call. Regardless, just moving from "int" to "u16" solves a bunch of value range tracking pain that GCC appears to get upset about, so if you really don't want the (tiny) sanity check, I can just send the u16 change. -Kees -- Kees Cook