Alex, Alex Belits <abelits@xxxxxxxxxxx> writes: > This is a new version of task isolation implementation. Previous version is at > https://lore.kernel.org/lkml/07c25c246c55012981ec0296eee23e68c719333a.camel@xxxxxxxxxxx/ > > Mostly this covers race conditions prevention on breaking isolation. Early after kernel entry, > task_isolation_enter() is called to update flags visible to other CPU cores and to perform > synchronization if necessary. Before this call only "safe" operations happen, as long as > CONFIG_TRACE_IRQFLAGS is not enabled. Without going into details of the individual patches, let me give you a high level view of this series: 1) Entry code handling: That's completely broken vs. the careful ordering and instrumentation protection of the entry code. You can't just slap stuff randomly into places which you think are safe w/o actually trying to understand why this code is ordered in the way it is. This clearly was never built and tested with any of the relevant debug options enabled. Both build and boot would have told you. 2) Instruction synchronization Trying to do instruction synchronization delayed is a clear recipe for hard to diagnose failures. Just because it blew not up in your face does not make it correct in any way. It's broken by design and violates _all_ rules of safe instruction patching and introduces a complete trainwreck in x86 NMI processing. If you really think that this is correct, then please have at least the courtesy to come up with a detailed and precise argumentation why this is a valid approach. While writing that up you surely will find out why it is not. 3) Debug calls Sprinkling debug calls around the codebase randomly is not going to happen. That's an unmaintainable mess. Aside of that none of these dmesg based debug things is necessary. This can simply be monitored with tracing. 4) Tons of undocumented smp barriers See Documentation/process/submit-checklist.rst #25 5) Signal on page fault Why is this a magic task isolation feature instead of making it something which can be used in general? There are other legit reasons why a task might want a notification about an unexpected (resolved) page fault. 6) Coding style violations all over the place Using checkpatch.pl is mandatory 7) Not Cc'ed maintainers While your Cc list is huge, you completely fail to Cc the relevant maintainers of various files and subsystems as requested in Documentation/process/* 8) Changelogs Most of the changelogs have something along the lines: 'task isolation does not want X, so do Y to make it not do X' without any single line of explanation why this approach was chosen and why it is correct under all circumstances and cannot have nasty side effects. It's not the job of the reviewers/maintainers to figure this out. Please come up with a coherent design first and then address the identified issues one by one in a way which is palatable and reviewable. Throwing a big pile of completely undocumented 'works for me' mess over the fence does not get you anywhere, not even to the point that people are willing to review it in detail. Thanks, tglx