On Wed, Aug 8, 2018 at 6:27 PM, Will Deacon <will.deacon@xxxxxxx> wrote: >> >> > Thanks for tracking these cases down and going through each of them. The >> >> > obvious follow-up question is: how do we ensure that we keep on top of >> >> > this in mainline? Are you going to repeat your experiment at every kernel >> >> > release or every -rc or something else? I really can't see how we can >> >> > maintain this in the long run, especially given that the coverage we have >> >> > is only dynamic -- do you have an idea of how much coverage you're actually >> >> > getting for, say, a defconfig+modules build? >> >> > >> >> > I'd really like to enable pointer tagging in the kernel, I'm just still >> >> > failing to see how we can do it in a controlled manner where we can reason >> >> > about the semantic changes using something other than a best-effort, >> >> > case-by-case basis which is likely to be fragile and error-prone. >> >> > Unfortunately, if that's all we have, then this gets relegated to a >> >> > debug feature, which sort of defeats the point in my opinion. >> >> >> >> Well, in some cases there is no other way as resorting to dynamic testing. >> >> How do we ensure that kernel does not dereference NULL pointers, does >> >> not access objects after free or out of bounds? Nohow. And, yes, it's >> >> constant maintenance burden resolved via dynamic testing. >> > >> > ... and the advantage of NULL pointer issues is that you're likely to see >> > them as a synchronous exception at runtime, regardless of architecture and >> > regardless of Kconfig options. With pointer tagging, that's certainly not >> > the case, and so I don't think we can just treat issues there like we do for >> > NULL pointers. >> >> Well, let's take use-after-frees, out-of-bounds, info leaks, data >> races is a good example, deadlocks and just logical bugs... > > Ok, but it was you that brought up NULL pointers, so there's some goalpost > moving here. I moved it only because our views on bugs seems to be somewhat different. I would put it all including NULL derefs into the same bucket of bugs. But the point I wanted to make holds if we take NULL derefs out of equation too, so I took them out so that we don't concentrate on "synchronous exceptions" only. > And as with NULL pointers, all of the issues you mention above > apply to other architectures and the majority of their configurations, so my > concerns about this feature remain. > >> > If you want to enable khwasan in "production" and since enabling it >> > could potentially change the behaviour of existing code paths, the >> > run-time validation space doubles as we'd need to get the same code >> > coverage with and without the feature being enabled. >> >> This is true for just any change in configs, sysctls or just a >> different workload. Any of this can enable new code, exiting code >> working differently, or just working with data in new states. And we >> have tens of thousands of bugs, so blindly deploying anything new to >> production without proper testing is a bad idea. It's not specific to >> HWASAN in any way. And when you enable HWASAN you actually do mean to >> retest everything as hard as possible. > > I suppose I'm trying to understand whether we have to resort to testing, or > whether we can do better. I'm really uncomfortable with testing as our only > means of getting this right because this is a non-standard, arm64-specific > option and I don't think it will get very much testing in mainline at all. > Rather, we'll get spurious bug reports from forks of -stable many releases > later and we'll actually be worse-off for it. > >> And in the end we do not seem to have any action points here, right? > > Right now, it feels like this series trades one set of bugs for another, > so I'd like to get to a position where this new set of bugs is genuinely > more manageable (i.e. detectable, fixable, preventable) than the old set. > Unfortunately, the only suggestion seems to be "testing", which I really > don't find convincing :( > > Could we do things like: > > - Set up a dedicated arm64 test farm, running mainline and with a public > frontend, aimed at getting maximum coverage of the kernel with KHWASAN > enabled? FWIW we could try to setup a syzbot instance with qemu/arm64 emulation. We run such combination few times, but I am not sure how stable it will be wrt flaky timeouts/stalls/etc. If works, it will give instant coverage of about 1MLOC. > - Have an implementation of KHWASAN for other architectures? (Is this even > possible?) > > - Have a compiler plugin to clear out the tag for pointer arithmetic? > Could we WARN if two pointers are compared with different tags? > Could we manipulate the tag on cast-to-pointer so that a mismatch would > be qualifier to say that pointer was created via a cast? > > - ... > > ? > > Will