On Thu, 2020-07-23 at 15:17 +0200, Thomas Gleixner wrote: > > Without going into details of the individual patches, let me give you a > high level view of this series: > > 1) Entry code handling: > > That's completely broken vs. the careful ordering and instrumentation > protection of the entry code. You can't just slap stuff randomly > into places which you think are safe w/o actually trying to understand > why this code is ordered in the way it is. > > This clearly was never built and tested with any of the relevant > debug options enabled. Both build and boot would have told you. This is intended to avoid a race condition when entry or exit from isolation happens at the same time as an event that requires synchronization. The idea is, it is possible to insulate the core from all events while it is running isolated task in userspace, it will receive those calls normally after breaking isolation and entering kernel, and it will synchronize itself on kernel entry. This has two potential problems that I am trying to solve: 1. Without careful ordering, there will be a race condition with events that happen at the same time as kernel entry or exit. 2. CPU runs some kernel code after entering but before synchronization. This code should be restricted to early entry that is not affected by the "stale" state, similar to how IPI code that receives synchronization events does it normally. I can't say that I am completely happy with the amount of kernel entry handling that had to be added. The problem is, I am trying to introduce a feature that allows CPU cores to go into "de-synchronized" state while running isolated tasks and not receiving synchronization events that normally would reach them. This means, there should be established some point on kernel entry when it is safe for the core to catch up with the rest of kernel. It may be useful for other purposes, however at this point task isolation is the first to need it, so I had to determine where such point is for every supported architecture and method of kernel entry. I have found that each architecture has its own way of handling this, and sometimes individual interrupt controller drivers vary in their sequence of calls on early kernel entry. For x86 I also have an implementation for kernel 5.6, before your changes to IDT macros. That version is much less straightforward, so I am grateful for those relatively recent improvements. Nevertheless, I believe that the goal of finding those points and using them for synchronization is valid. If you can recommend me a better way for at least x86, I will be happy to follow your advice. I have tried to cover kernel entry in a generic way while making the changes least disruptive, and this is why it looks simple and spread over multiple places. I also had to do the same for arm and arm64 (that I use for development), and for each architecture I had to produce sequences of entry points and function calls to determine the correct placement of task_isolation_enter() calls in them. It is not random, however it does reflect the complex nature of kernel entry code. I believe, RCU implementation faced somewhat similar requirements for calls on kernel entry, however it is not completely unified, either > 2) Instruction synchronization > Trying to do instruction synchronization delayed is a clear recipe > for hard to diagnose failures. Just because it blew not up in your > face does not make it correct in any way. It's broken by design and > violates _all_ rules of safe instruction patching and introduces a > complete trainwreck in x86 NMI processing. The idea is that just like synchronization events are handled by regular IPI, we already use some code with the assumption that it is safe to be entered in "stale" state before synchronization. I have extended it to allow synchronization points on all kernel entry points. > If you really think that this is correct, then please have at least > the courtesy to come up with a detailed and precise argumentation > why this is a valid approach. > > While writing that up you surely will find out why it is not. > I had to document a sequence of calls for every entry point on three supported architectures, to determine the points for synchronization. It is possible that I have somehow missed something, however I don't see a better approach, save for establishing a kernel-wide infrastructure for this. And even if we did just that, it would be possible to implement this kind of synchronization point calls first, and convert them to something more generic later. > > 3) Debug calls > > Sprinkling debug calls around the codebase randomly is not going to > happen. That's an unmaintainable mess. Those report isolation breaking causes, and are intended for application and system debugging. > > Aside of that none of these dmesg based debug things is necessary. > This can simply be monitored with tracing. I think, it would be better to make all that information available to the userspace application, however I have based this on the Chris Metcalf code, and gradually updated the mechanisms and interfaces. The original reporting of isolation breaking causes had far greater problems, so at first I wanted to have something that produces easily visible and correct reporting, and does not break things while doing so. > > 4) Tons of undocumented smp barriers > > See Documentation/process/submit-checklist.rst #25 > That should be fixed. > 5) Signal on page fault > > Why is this a magic task isolation feature instead of making it > something which can be used in general? There are other legit > reasons why a task might want a notification about an unexpected > (resolved) page fault. Page fault causes isolation breaking. When a task runs in isolated mode it does so because it requires predictable timing, so causing page faults and expecting them to be handled along the way would defeat the purpose of isolation. So if page fault did happen, it is important that application will receive notification about isolation being broken, and then may decide to do something about it, re-enter isolation, etc. > > 6) Coding style violations all over the place > > Using checkpatch.pl is mandatory > > 7) Not Cc'ed maintainers > > While your Cc list is huge, you completely fail to Cc the relevant > maintainers of various files and subsystems as requested in > Documentation/process/* To be honest, I am not sure, whom I have missed, I tried to include everyone from my previous attempt. > > 8) Changelogs > > Most of the changelogs have something along the lines: > > 'task isolation does not want X, so do Y to make it not do X' > > without any single line of explanation why this approach was chosen > and why it is correct under all circumstances and cannot have nasty > side effects. This is the same as the previous version, except for the addition of kernel entry handling. As far as I can tell, the rest was discussed before, and not many questions remained except for the race condition on kernel entry. I agree that kernel entry handling is a complex issue in itself, so I have included explanation of entry points / function calls sequences for each supported architecture. I have longer call diagram, that I used to track each particular function, it probably should be included as a separate document. > It's not the job of the reviewers/maintainers to figure this out. > > Please come up with a coherent design first and then address the > identified issues one by one in a way which is palatable and reviewable. > > Throwing a big pile of completely undocumented 'works for me' mess over > the fence does not get you anywhere, not even to the point that people > are willing to review it in detail. There is a design, and it is a result of a careful tracking of calls in the kernel source. It has multiple point where task_isolation_enter() is called for a reason similar to why RCU-related functions are called in multiple places. If someone can recommend a better way to introduce a kernel entry checkpoint for synchronization that did not exist before, I will be happy to hear it. -- Alex