On Wed, 2019-06-12 at 13:00 +0200, Borislav Petkov wrote: > On Wed, Jun 12, 2019 at 07:42:42AM -0300, Mauro Carvalho Chehab wrote: > > That's said, from the admin PoV, it makes sense to have a single > > daemon that collect errors from all error sources and take the > > needed actions. > > Doing recovery actions in userspace is too flaky. Daemon can get killed > at any point in time So what ? If root kills your RAS daemon, then so be it. That has never been a problem on POWER8/POWER9 server platforms and those have some of the nastiest RAS in town. You can kill PID 1 too you know ... > and there are error types where you want to do recovery *before* you return to userspace. Very few (precise examples please) and I yet have to see why those need some kind of magic coordinator. > Yes, we do have different error reporting facilities but I still think > that concentrating all the error information needed in order to do > proper recovery action is the better approach here. And make that part > of the kernel so that it is robust. Userspace can still configure it and > so on. Ben.