Hey, Matt. On Fri, Jul 22, 2011 at 04:19:53PM -0700, Matt Helsley wrote: > parasitism is fine for a slow-but-sure debugger but is not suitable > for checkpoint/restart. Hmmm... okay, can you elaborate on that? I can't reach the same conclusion from what you wrote below. You're implying parasitism would be too slow for CR, right? But why would it be slower or faster in any meaningful way than in-kernel implementation? > The difficulty of checkpoint/restart is not that the task has > more information than the kernel. But yes it is, if you're trying to implement it from userland in transparent manner. There is a lot of information which is not available to a third party process and some of the available information is painfully slow to get to (e.g. PTRACE_PEEK/POKEDATA is word-by-word). > Quite the contrary. Most of the "information" the task has that the > kernel is not explicitly aware of is encoded in the task's > memory. So long as the kernel faithfully restores memory and > registers the task can know little the kernel doesn't already know. Sure, kernel ultimately knows and can access *everything*, but we aren't talking about in-kernel implementation here. > One example of something the task knows that the kernel does not is > which pids it cares about. However, a parasitic thread capable > of checkpointing arbitrary processes won't know about these pids > either -- it would have to be designed to checkpoint *only* the > process it was injected into. That's what the outer mechanism should provide regardless of how the core CR is implemented. Maybe it is NS based, maybe it's just some subset of processes. It doesn't have much to do with core implementation. > Furthermore, the kernel has information necessary for > checkpoint/restart that the task does not. The composition of an > epoll set is one example. Again, sure, kernel knows and can access everything, but most of necessary information is already available in userland. If epoll isn't available, let's export epoll information. We have /proc/PID/fdinfo already. If that's not the correct interface for whatever reason, we can add introspection to epoll itself and make parasite query it. It's not like problems solve themselves automatically if you put CR inside the kernel. It side-steps a lot of issues mostly by allowing avoiding difficult userland visible decisions, but as you already know well enough, I think that does more harm than good. Last year, when we were talking about userland implementation, one of the arguments was that ptrace / jobctl interaction was too messy and broken to be used for CR, but it's fixed now and the interaction is well defined and jobctl states are fully capturable. And really, before, ptrace or in-kernel CR, it wasn't possible to capture the states properly, they were simply broken and not well defined enough. Identifying and fixing individual missing pieces is both more benefical to the kernel in general and much more likely to be merged upstream and the ptrace change for sure took a lot more time than I expected but it was something which has been horribly broken for a very long time and was very complex to deal with. I think other pieces - most of which should be about exporting more info via some mechanism - should be much easier. > So ptrace is just the wrong interface to base checkpoint/restart on. > Pavel's approach, though I believe it is subtly flawed, is better. Again, I just don't understand how you draw the above conclusion from the arguments you provided above. I don't see much connection between the arguments and the conclusion. Thanks. -- tejun _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers