On Tue, Mar 23, 2021 at 04:21:47PM +0000, Sean Christopherson wrote: > I like the idea of pointing at the documentation. The documentation should > probably emphasize that something is very, very wrong. Yap, because no matter how we formulate the error message, it still ain't enough and needs a longer explanation. > E.g. if a kernel bug triggers EREMOVE failure and isn't detected until > the kernel is widely deployed in a fleet, then the folks deploying the > kernel probably _should_ be in all out panic. For this variety of bug > to escape that far, it means there are huge holes in test coverage, in > both the kernel itself and in the infrasturcture of whoever is rolling > out their new kernel. You sound just like someone who works at a company with a big fleet, oh wait... :-) And yap, you big fleeted guys will more likely catch it but we do have all these other customers who have a handful of servers only so they probably won't be able to do such a wide coverage. So I hope they'll appreciate this longer explanation about what to do when they hit it. And normally I wouldn't even care but we almost never tell people to reboot their boxes to fix sh*t - that's the other OS. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette