Em Wed, 12 Jun 2019 13:00:39 +0200 Borislav Petkov <bp@xxxxxxxxx> escreveu: > On Wed, Jun 12, 2019 at 07:42:42AM -0300, Mauro Carvalho Chehab wrote: > > That's said, from the admin PoV, it makes sense to have a single > > daemon that collect errors from all error sources and take the > > needed actions. > > Doing recovery actions in userspace is too flaky. Daemon can get killed > at any point in time and there are error types where you want to do > recovery *before* you return to userspace. Yeah, some actions would work a lot better at Kernelspace. Yet, some actions would work a lot better if implemented on userspace. For example, a server with multiple network interfaces may re-route the traffic to a backup interface if the main one has too many errors. This can easily be done on userspace. > Yes, we do have different error reporting facilities but I still think > that concentrating all the error information needed in order to do > proper recovery action is the better approach here. And make that part > of the kernel so that it is robust. Userspace can still configure it and > so on. If the error reporting facilities are for the same hardware "group" (like the machine's memory controllers), I agree with you: it makes sense to have a single driver. If they are for completely independent hardware then implementing as separate drivers would work equally well, with the advantage of making easier to maintain and make it generic enough to support different vendors using the same IP block. Thanks, Mauro