On Mon, 2024-04-29 at 13:13 -0700, Kees Cook wrote: > On Mon, Apr 29, 2024 at 02:31:19PM -0400, Martin K. Petersen wrote: > > > > Kees, > > > > > > This patch seems to be lost. Gustavo reviewed it on January 15, > > > > 2024 but the patch has not been applied since. > > > > > > This looks correct to me. I can pick this up if no one else snags > > > it? > > > > I guess my original reply didn't make it out, I don't see it in the > > archives. > > > > My objections were: > > > > 1. The original code is more readable to me than the proposed > > replacement. > > I guess this is a style preference. I find the proposed easier to > read. It also removes lines while doing it. :) > > > 2. The original code has worked since introduced in 2012. Nobody > > has touched it since, presumably it's fine. > > The code itself is fine unless you have a 32-bit system with a > malicious card, so yeah, near zero risk. Well, no actually zero: we assume plugged in hardware to operate correctly (had this argument in the driver hardening thread a while ago), but in this particular case you'd have to have a card with a very high number of ports, which would cause kernel allocations to fail long before anything could introduce an overflow of sizeof(struct csio_lnode *) * hw->num_lns. > > 3. I don't have the hardware and thus no way of validating the > > proposed changes. > > This is kind of an ongoing tension we have between driver code and > refactoring efforts. That's because we keep having cockups where we accept so called "zero risk" changes to older drivers only to have people with the hardware turn up months to years later demanding to know why we broke it. Security is about balancing risks and the risk here of a malicious adversary crafting an attack based on a driver so few people use (and given they'd have to come up with modified hardware) seems equally zero. > And this isn't a case where we can show identical binary output, > since this actively adds overflow checking via kcalloc() internals. Overflow checking which is unnecessary as I showed above. > > So what is the benefit of me accepting this patch? We have had > > several regressions in these conversions. Had one just last week, > > almost identical in nature to the one at hand. > > People are working through large piles of known "weak code patterns" > with the goal of reaching 0 instances in the kernel. Usually this is > for ongoing greater compiler flag coverage, but this particular one > is harder for the compiler to warn on, so it's from Coccinelle > patterns. We understand the problem and we're happy to investigate and then explain why something like this can't be exploited, so what's the issue with adding it to the exceptions list given that, as you said, it's never going to be compiler detected? > > I am all for fixing code which is undergoing active use and > > development. But I really don't see the benefit of updating a > > legacy driver which hasn't seen updates in ages. Why risk > > introducing a regression? > > I see a common pattern where "why risk introducing a regression?" > gets paired with "we can't test this code". I'm really not sure what > to do about this given how much the kernel is changing all the time. Well, it's a balance of risks, but given that there's zero chance of exploitation of the potential overflow, it would seem that balance lies on the side of not risking the regression. I think if you could demonstrate you were fixing an exploitable bug (without needing modified hardware) the balance would lie differently. > In this particular case, I guess all I can say is that it is a > trivially correct change that uses a more robust API and more > idiomatic allocation sizeof()s (i.e. use the sizeof() of what is > being allocated, not a potentially disconnected struct name). Which is somewhat similar to the statement other people made about the strncpy replacement which eventually turned out to cause a problem. James